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Abstract 

The objective of the present paper is to introduce the concept of a spatially inhomogeneous 
linear inverse problem which, to the best of the author's knowledge, has never been considered 
previously in statistical framework but is emerging due to a variety of practical applications. The 
special feature of the problem is that the degree of ill-posedncss depends not only on the scale 
but also on location. In this case, the rates of convergence are determined by the interaction of 
four parameters, the smoothness and spatial homogeneity of the unknown function / and degrees 
of ill-posedness and spatial inhomogeneity of operator Q. An interesting property here is that, if 
operator Q is weakly inhomogeneous, then the rates of convergence are not influenced by spatial 
inhomogeneity of operator Q and coincide with the rates which are usual for homogeneous linear 
inverse problems. On the other hand, if operator Q is moderately or strongly inhomogeneous, 
convergence rates are significantly affected by the degree of spatial inhomogeneity. 

Estimators obtained in the paper are based either on wavelet-vaguelette decomposition (if 
the norms of all vaguclcttcs arc finite) or on a hybrid of wavelet-vaguelette decomposition and 
Galcrkin method (if vaguclcttcs in the neighborhood of the singularity point have infinite norms). 
The hybrid estimator is a combination of a linear part in the vicinity of the singularity point and 
the nonlinear block thresholding wavelet estimator elsewhere. To attain adaptivity, an optimal 
resolution level for the linear, singularity affected, portion of the estimator is obtained using 
Lepskii (1990, 1999) method. Subsequently, this resolution level is used as the lowest resolution 
level for the nonlinear wavelet estimator. We show that convergence rates of the hybrid estimator 
lie within a logarithmic factor of the optimal minimax convergence rates. 

The theory presented in the paper is supplemented by examples of deconvolution with a spa- 
tially inhomogeneous kernel, deconvolution in the presence of locally extreme noise or extremely 
inhomogeneous design. The first two problems are examined via a limited simulation study which 
demonstrates advantages of the hybrid estimator when the degree of spatial inhomogeneity is 
high. In addition, we apply the technique to recovery of a convolution signal transmitted via 
amplitude modulation. 

Keywords and phrases: Statistical linear inverse problems, inhomogeneous, minimax con- 
vergence rates, singularity 
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1 Introduction 
1.1 Formulation 

Let Q be a known linear operator on a Hilbert space H with inner product (•, •). The objective is 
to recover / S H from observations on 

y(x) = (Qf)(x) + ^W(x), xeX, (1.1) 
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where W(x) is the white noise process and y/e is noise level. Assume that observations can be taken 
as functionals of y 

(v,g) = {Qf,g) + Vi£(g), g^H, (1.2) 

where £(g) is a Gaussian random variable with zero mean and variance \\g\\ 2 such that E£(<7i)£(<?2) = 
(gi, go). In what follows, || • || denotes the L 2 norm, all other norms are explicitly marked. 

Model (|l.ip is a common representation of a linear inverse problems with the Gaussian noise 
and has been studied by many authors (see, e.g., Abramovich and Silverman (1998), Bissantz et 
al. (2007), Cavalier and Golubev (2006), Cavalier et al. (2002), Cohen et al. (2004), Hoffmann, 
and Reis (2004), Donoho (1995), Golubev (2010), Hoffmann and Reiss (2008), Kalifa and Mallat 
(2003), and Mair and Ruymgaart (1996), among others). A typical assumption in the problem 
above is that operator Q acts uniformly over the spaces of functions represented at a common scale 
independently of the location of a function. In particular, consider a set of "test" functions 

iM*) = *- 1/5 V (^) (i-3) 

where ip(x),x 6 [0, 1], has a bounded support (L^, U^p) and unit L 2 norm \\ip\\ = 1. Then, functions 
ipha{x) have scale h, supports concentrated around x = a and unit norms. Conditions which are 
commonly imposed on operator Q imply that it contracts the norms of all functions ipha uniformly, 
i.e., the value of ||QVhol| depends considerably on the scale h but hardly at all on a. Moreover, if 
there exist {Q*)~ 1 ipha, where Q* is the adjoint of operator Q, then values of || {Q*)~ 1 ipha\\ follow 
the same pattern. However, not all linear operators necessarily have those properties. 

In order to illustrate the discussion above, consider linear operator Q with the adjoint Q* 
given by 

(Q/)(x) = n(x) r f(t)dt, {Q*v){x)= f fi(z)v(z)dz, (1.4) 



where n(x) is a smooth function. Assume that function ip in (I1.3P is continuously differentiable and 
integrates to zero: ^(z)dz = 0. Denote ^f(z) = f£ if^(x)dx and observe that ^f(z) = whenever 
z (L^, Uip). Then, direct calculations yield 



so that 



WQiPhaf = h 2 f U \ 2 (a + hz)* 2 (z)dz = h 2 [ f j 2 (a)\\*\\ 2 +o(l)], {h->0), 

J 

||(Q*)~Vha|| 2 = hT 2 \ ^r 2 (a + hz)[ij'(z)} 2 dz. 

If ju(y) is a constant or, at least, C" 1 < fi(y) < for some relatively small C^, then dependence 
°f IIQVViall and ||(Q*) _1 V'?iall on a can be ignored, and equation ([Lip with Q given by (|1.4j) can be 
treated as a spatially homogeneous problem. However, if fi(y) varies significantly, dependence on a 
becomes essential and equation (jl.ip is a spatially inhomogeneous inverse problem. 

Dependence on a becomes even more extreme if /x(y) vanishes at some point xq £ (0, 1), e.g., 
H 2 (x) = C a \x — xo\ a . Indeed, in this case, xq is the singularity point and it is easy to show that 
IIQ^xoll 2 ~h 2+a and 



ll(QTW s 



h~( 2+a \ if a<l, 
oo, if a > 1. 
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Hence, if a is large, dependence of ||Q^/i || an d \\{Q*) lri Pha\\ on location becomes quite extreme. 

Since wavelets provide an adequate tool for scale-location representations of functional spaces, 
it is convenient to introduce spatially inhomogeneous linear inverse problems using a wavelet- 
vaguelette decomposition proposed by Donoho (1995). In particular, in the case when H = L 2 (V), 
T> C R, Donoho's assumptions appear as follows: 



(Dl) There exist three sets of functions: {il>j k }, & n orthonormal wavelet basis of H, and nearly 



A 



3 ' 



\Ujk\ 
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orthogonal sets {iijk} and {vj k } such that Qipj k = Vj k , Q*Ujk = iftjk, 
where Xj depend on resolution index j but not on spatial index k. 
(D2) u jk and v jk are such that (u jlkl ,v j2k2 ) = 5 juh 5 klM . 

(D3) Sets {uj k } and {vj k } are nearly orthogonal, i.e, for any sequence {aj k } G I 2 one has 



E 



E 
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Under conditions (D1)-(D3), / can be recovered using reproducing formula 

/ = ^2(Qf,Ujk)ipjk (1-5) 

which is analogous to the reproducing formula for the SVD. Assumptions (D1)-(D3) are quite 
standard. Indeed, similar assumptions were introduced in Cavalier et al. (2002), Cavalier and 
Golubev (2006), Golubev (2010) and Knapik et al. (2011). The common premise is that operator 
Q acts "uniformly" over subspaces of H, so singular values or their surrogate equivalents depend 
on the resolution level only but not on location. If Vj = Spanjf/^fc, i < j, k £ Z} is the subspace of 
functions at resolution level j, the above assumptions reduce to a common assumption of Galerkin 
method (see, e.g., Cohen at al. (2004) or Hoffmann and Reiss (2008)) that on subspace Vj operator 
Q has a bounded inverse with the norm dependent on j only, i.e., there exist Xj > such that 

sup AjllQ -1 !!^.-^-!^) < oo, (1.6) 
j>0 1 J 

which is very similar to combination of assumptions (Dl) and (D3) above. 

Note that both, assumptions (Dl) and (II. 6p imply that any function v G Vj with \\v\\ = 1 has 
an inverse image, the norm of which is bounded by a constant which is independent of the support 
of v. In this sense, operator Q is an ill-posed spatially homogeneous operator. In the present paper, 
we shall be interested in a different situation when assumptions (Dl) and (D3) may not be true. 
In particular, we assume that the norms of the inverse images of V^fc depend on the spatial index 
k and may be unbounded, i.e. condition (Dl) and possibly condition (D3) are violated. We shall 
refer to the such inverse linear problems as spatially inhomogeneous in comparison with spatially 
homogeneous problems which satisfy conditions (D1)-(D3) above. 

1.2 Motivation 

Spatially inhomogeneous ill-posed problems appear naturally in the case when either the noise level 
is spatially dependent or observations are irregularly spaced. Problems of this kind have been 
considered before, both theoretically and in practical applications. Nevertheless, in former studies, 
it was always assumed that the noise level is uniformly bounded above or the design density of 
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observations is bounded away from zero. The situations investigated in the present paper rather 
refer to locally extreme noise and extremely inhomogeneous design (which can be also described as 
a local data loss). Traditionally, in the first situation, measurements are treated as outliers and 
are removed from future analysis while the second one is dealt with as the case of missing data. 
There are, however, multiple ill-posed problems where data quality varies and preserving all data 
for future analysis appears as a prudent choice. Problems of this sort deconvolution of LIDAR 
signals (see e.g., Harsdorf and Reuter (2000) and Gurdev et al. (2002)), or astronomical images 
(see, e.g., Starck et al. (2002) and Weddell and Webb (2008) ) or analysis of forensic data (see, e.g., 
Li and Satta (2011)). Approach suggested in a present paper provides an alternative to missing 
data techniques which are usually applied in this case. 

In addition, spatially inhomogeneous ill-posed problems arise in engineering or mathematical 
physics whenever the kernel is spatially inhomogeneous, as it occurs in the case of the amplitude 
modulation which is applied for transmitting information in the form of electro-magnetic waves. 

Below we consider some examples in more detail. 

Example 1 Deconvolution of LIDAR signals LIDAR (Light Detection And Ranging or Laser 
Imaging Detection And Ranging) is an optical remote sensing technology that can measure the 
distance to, or other properties of, targets by illuminating the target with laser light and analyzing 
the backscattered light. LIDAR technology has applications in archaeology, geography, geology, 
geomorphology, seismology, forestry, remote sensing, atmospheric physics. LIDAR data model is 
mathematically described by convolution equation P = R* Pg where P is the time-resolved LIDAR 
signal, Pg is the impulse response function and R is the system response function to be determined 
(see, e.g., Harsdorf and Reuter (2000) and Gurdev et al. (2002)). However, if the system response 
function of the LIDAR is longer than the time resolution interval, then the measured LIDAR signal is 
blurred and the effective accuracy of the LIDAR decreases. This loss of precision becomes extreme 
when, for example, LIDAR is used to for emergency response and natural disaster management 
such as assessment of the extent of damage due to volcanic eruptions or forest fires. This is due to 
the presence of dust, smoke and other obstructions which LIDAR signal cannot penetrate. In this 
situation, routinely, distances are calculated through filtering of the data set (removing outliers) and 
applying interpolation techniques. However, keeping all existing data and accounting for extreme 
noise may improve precision of the analysis of LIDAR signals. 

Example 2 Deconvolution of astronomical data Deconvolution of astronomical images has 
proven in some cases to be crucial for extracting scientific content. For example, deconvolved mid- 
infrared images are used to reveal inner structure of the active galactic nucleus hidden at lower 
wavelength because of the high extinction. Also, research on gravitational lenses is easier and 
more efficient when applying deconvolution methods (see, e.g., Starck et al. (2002) and references 
therein). In addition, deconvolution is also crucial in order to fully take advantage of increasing 
numbers of high-quality ground-based telescopes like the Hubble Space Telescope, for which images 
are strongly limited in resolution by the seeing. 

Analysis of astronomical images is usually formulated as a two-dimensional deconvolution 
problem with the spatial impulse response function, commonly referred to as the point spread 
function (PSF), as a kernel and an additive noise. Extreme measurement errors are ubiquitous in 
astronomy. Common sources of measurement error are the Poissonian nature of photon counts, 
instrumental noise, and calibration. In addition to the ever-present effect of noise from imaging 
equipment and optical defects from instrumentation, images from ground-based telescopes are dis- 
torted by wavefront aberrations caused by atmospheric turbulence. The PSF which is used to 
represent such distortions can either be applied over the entire image, or within regions uniquely 
defined by the isoplanatic angle. The combination of such regions forms an extended image, where 
the spatially variant PSF is used for image restoration (see, e.g., Weddell and Webb (2008)). Both 
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situations lead to a two-dimensional version of model considered in Section [7.11 where large degrees 
of spatial inhomogeneity correspond to extreme distortions. 

Example 3 Amplitude Modulation Amplitude Modulation (AM) is a way of transmitting in- 
formation in the form of electro- magnetic waves. In AM, a radio wave known as the "carrier" or 
"carrier wave" is modulated in amplitude by the signal that is to be transmitted, while the fre- 
quency remains constant (see, e.g., Miller et al. (2009)). In video or image transmission (such 
as TV) where the base-band signal has inherent large bandwidth, AM is usually preferred to Fre- 
quency Modulation (FM) systems since the latter ones require additional bandwidth. Since in an 
AM, signal information is "stored" in amplitude which is affected by noise, AM is more susceptible 
to noise than FM. Mathematically, the problem reduces to multiplying the transmitted signal by 
the function fj,(x) = cos(2ttlox — 6) with large uj ps n/2 and 9 £ [0;27r]. In Section [8721 we provide 
an in-depth description of application of the methodology developed in the paper to recovery of a 
convolution signal transmitted via AM. 

1.3 Objectives and layout of the paper 

The objective of the present paper is to introduce the concept of a spatially inhomogeneous linear 
inverse problem which, to the best of the author's knowledge, has never been considered previously 
in statistical framework but is emerging due to a variety of practical applications. It turns out that 
spatially inhomogeneous problems exhibit properties which are very different from their spatially 
homogeneous counterparts. In particular, if the norms of vaguelettes Uj^ = (Q*)~ 1 4>jk are infinite 
in the vicinity of a singularity point, reproducing formula (|1.5p cease working and the usual wavelet- 
vaguelette estimators cannot be applied. In this case, we propose a hybrid estimator which is based 
on combination of wavelet-vaguelette decomposition and Galerkin method. We study two applica- 
tion of the general theory, deconvolution with spatially inhomogeneous design and deconvolution 
with a spatially inhomogeneous kernel (the case of heterogeneous noise being a particular case of 
the latter). 

Another interesting feature of the model is that the rates of convergence are determined by the 
interaction of four parameters, the smoothness and spatial homogeneity of the unknown function / 
and degrees of ill-posedness and spatial inhomogeneity of operator Q. In particular, if operator Q is 
weakly inhomogeneous, then the rates of convergence are not influenced by spatial inhomogeneity 
of operator Q and coincide with the rates which are usual for homogeneous linear inverse problems. 

In what follows, we assume that operator Q in (jl.ip is completely known. If, in practical 
applications, this is not true, one has to account for the extraneous errors which stem from the 
uncertainty in the operator Q by using, for example, ideas of Hoffman and Reiss (2008). Also, 
to simplify our considerations, we limit our study to the case when X = [0,1], H = L 2 [0,1], 
and k is a scalar. The theory presented below can be generalized to the case when H = L 2 (T>), 
V C R d and k is a d-dimensional vector. This extension should be relatively straightforward if one 
is dealing with isotropic Besov spaces but becomes much more interesting and involved in the case 
of anisotropic Besov spaces (see, e.g. Kerkyacharian, Lepski and Picard (2001)). However, we leave 
those extensions for future investigations since considering them below will prevent us from focusing 
on the main objective of the paper. 

The rest of the paper is organized as follows. Section [2] introduces the concept of a spatially 
inhomogeneous ill-posed problem and formulates major definitions and assumptions which are used 
throughout the paper. Section [3] presents the asymptotic minimax lower bounds for the L 2 -risk 
of the estimators of the solution of the problem over a wide range of Besov balls. Section 2] talks 
about estimation strategies, in particular, about partitioning the unknown response function / 
and its estimator into the singularity- affected and the singularity-free parts, the main idea at the 
core of the hybrid estimator. Section [5] elaborates on the risk of the estimator constructed in the 
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previous section when the lowest resolution level in the zero-affected portion of the estimator is fixed. 
Section O discusses the adaptive choice of the lowest resolution level resolution level and derives 
the asymptotic minimax upper bounds for the L 2 -risk. In Section we consider two examples 
of spatially inhomogeneous ill-posed problems, deconvolution with the spatially inhomogeneous 
operator (Section IT. lj) which can be viewed as a version of a deconvolution equation with spatially 
inhomogeneous noise, and deconvolution based on irregularly spaced sample (Section l7.2j) . Section [8] 
presents a limited simulation study of deconvolution with heteroscedastic noise which demonstrates 
advantages of the hybrid estimator when the degree of spatial inhomogeneity is high. Section [8] 
also studies application of the hybrid estimator to recovery of a convolution signal transmitted 
via Amplitude Modulation. Section [9] concludes the paper with a discussion. Finally, Section [10] 
contains the proofs of the statements in the earlier sections. 



2 Spatially inhomogeneous ill-posed problem: assumptions and 
definitions 

Consider a scaling function <p and a corresponding wavelet ip with bounded supports and form an 
orthonormal wavelet basis {ipj k } of L 2 ([0, 1]). We further impose the following set of assumptions 
on spatially inhomogeneous operator Q. 



(Al) There exist functions {uj tk } and {vj tk } such that 

Qip jk = v j>k , Q*u jjk = ip jk , 

where \\vj jk \\ = Xj,k < o°- 



(2.1) 



(A2) There exists a singularity point xq £ (0, 1) and a constant D > such that ||fi,-fc|| = oo 
if \k — koj\ < D and for any {aj±}, k = 0, • • • , 2 3 — 1, one has 



\k-k oj \>D 



\k-k 0j \>D 



(2.2) 



where C u < oo is independent of j and k^j = 2 J xo is the parameter corresponding to location xo 
(koj is not necessarily an integer). 



(A3) Functions Vj >k are such that inequality 



2-?-l 



Yl A ji 



v i,k 



k=0 



21-1 

— a\ k 

k=0 



(2.3) 



holds for any {aj tk }, k = 0, • • • , 1? — 1, where C v < oo is independent of j. 



Note that Assumptions (Al)-(A3) are much weaker than Assumptions (D1)-(D3) above. 
First, Xj tk depends not only on resolution level but also on location of the wavelet coefficient. 
Observe also that, even if D = 0, assumptions (A2) and (A3) are weaker than assumption (D3) 
since the sums is taken over one resolution level only. Moreover, if D > 0, then, in the neighborhood 
of the singularity point xo, wavelet coefficients cannot be recovered directly since Htt^H = oo, and 
we say that operator Q has a singulariy at xq. 
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Since one usually start wavelet expansion at some finite resolution level m, below we list an 
extra assumption which mirrors Assumption (A2) and can be derived from it: 

(A4) There exist functions {i mi fc} and positive constants Ct and Dq independent of m such 
that for any {a m k}, k = 0, • • • , 2 m — 1, 



If D = Dq = in Assumptions (A2) and (A4), then ||%',fc|| < oo and ||i m ,fc|| < oo for any k. 
Hence, / can be expressed using reproducing formula (jl,5p which, in this case, becomes 

2 m -l oo 2^-1 

W + ^J] b jk ^ jk (x), (2.5) 

k=0 i="i k=0 

where a mk = (Qf,t m ,k) arid bjk = (Qf,Uj,k)- If D > 0, reproducing formula (|2.5p does not work 
and one needs an alternative solution to recovering /. Indeed, if Qf in expressions for a mk and bj k is 
replaced by y = Qf + eW, then the variances of the wavelet coefficients in the vicinity of singularity 
xq are infinite: Var(y, Uj tk ) = oo if \k — koj\ < D and similar consideration applies to t m ^ k . For 
this reason, at each resolution level, we partition the set of all indices into the singularity-affected 
indices 

K 0m = {k = 0, • • • ,2 m - 1 : \k- k 0m \ < A)}, Kij = {fc = 0, 1 : |fc - k 0j \ < D} 

and the singularity-free indices 

K£ m = {k : < k < 2 m - 1, k $ K 0m } , K c Xj = {k : < k < 2 j - 1, k £ K Xj } . 

To be specific, in what follows, we assume that Aj jt are such that, for some positive constants 
a, /3, C\ and C\ independent of j and k, one has 

C Xo 2^( Q+ «(1 + |fc - fc 0j |) Q < \\ k < C x 2~^ a+ V(l + \k- k 0j \) a . (2.6) 

We shall refer to coefficients (5 and a in (12. 6h as degrees of ill-posedness and spatial inhomogeneity, 
respectively. Observe that with A^fe satisfying condition (|2.6p . the variances of the coefficients 
at the lower resolution levels may be significantly higher than the variances of the coefficients at 
higher resolution levels as long as the location of the lower resolution level coefficients lie in a close 
proximity of a singularity point. 

In the present paper, we consider estimation of a solution of inhomogeneous linear inverse 
problems in the case when the unknown function / is possibly spatially inhomogeneous itself, in 
particular, / belongs to a Besov ball Bp q (A) of radius A. Interplay between spatial inhomogeneity 
of operator Q and properties of / lead to various very interesting phenomena. In particular, if a is 
small or p and /3 are relatively large, spatial inhomogeneity does not affect convergence rates and / 
can be recovered as well as in the case of a = 0. 

Remark 1 (Multiple singularity points) Note that one can consider a spatially inhomogeneous 
problems with multiple singularity points xq \ < Xo,2 < • • • < a^o,L an d corresponding constants 
Di, ■ ■ • , Dl where L < oo and xq^ — Xo,i-i > 5 > for some fixed positive 5. Theory developed 
below can be easily extended to this case, with the convergence rates of the estimators determined 
by the "worst case scenario" among singular points xo,i, i = 1, ■ ■ ■ , L. 



E 

\k-k 0j \>D 



Q"mk Am ,fc^m ,fc 



< a. 



k-koj\>D 



2 

a mk- 



(2.4) 
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3 Minimax lower bounds for the risk over Besov balls 

Before constructing an estimator of the unknown function / under model (jl.ip . we derive the 
asymptotic minimax lower bounds for the L 2 -risk over a wide range of Besov balls. 

Recall that for an ro-regular multiresolution analysis (see, e.g., Meyer (1992), pp 21-25), with 
< s < ro, and for a Besov ball Bp J A) 



B s p , q (A) 



/€Z7([0,1]): E 



+ 



k=0 



f G B s p>q , II, 






— 1/p, one 


has 




oo 




9/p\ 


E * v ' 






ii="> 




/ 



1/9 



< A 



(3-1) 



with respective sum(s) replaced by maximum if p = oo and/or q = oo (see, e.g., Johnstone et. al 
(2004)). We study below the minimax L 2 -risk over Besov balls Bp q (A) defined as 

R e {B s m {A)) = inf sup EII/-/H 2 , 
/ /eflj, e (A) 

where the infimum is taken over all possible square-integrable estimators f of f based on y from 
model (fTTj) . 



In what follows, we use the symbol C for a generic positive constant, which may take different 
values at different places and is independent of the noise level e. The following statement provides 
the asymptotic minimax lower bounds for the L 2 -risk. 

Theorem 1 Let 1 < p,q < oo and s > max(l/p, 1/2). Then, under Assumptions (Al)-(A3), as 
e -> 0, 

R £ (B s p>g (A))>CA(e), (3.2) 

where 

2(a+p) 2s' 

A2s'+ a +0 £ 2s'+ a +0 if 2s(a- 1) > (/3 + l)(l-2/p), 

2(/3+l) 2s 

J 42 S+/ 3+i £ 2 3 +^+i if 2s(a- 1) < + -2/p), 



A(e) 



(3.3) 



Remark 2 (Convergence rates) As we shall show below, the minimax global convergence rates 
in Theorem [1] are attainable up to a logarithmic factor. The rates are determined by the interaction 
of four parameters, s,p, a and f3. Parameters s and p describe, respectively, smoothness and spatial 
homogeneity of the unknown function /, while ft and a, defined in (|2.6p . are referred to as degrees 
of ill-posedness and spatial inhomogeneity of operator Q. If the value of a is large, in particular, 
2sa > 2s' + ft(l — 2/p), we say that operator Q is strongly inhomogeneous while in the case when 
2sa < 2s' + (3(1 — 2/p) we call operator Q weakly inhomogeneous. The case when 2sa = 2s' + (3(1 — 
2/p) is referred to as moderately inhomogeneous. Observe that in the weakly inhomogeneous case, 
spatial inhomogeneity of operator Q does not affect convergence rate which is determined entirely 
by the degree of ill-posedness ft. On the other hand, for large values of a, convergence rate is 
significantly affected by the degree of spatial inhomogeneity a of Q. 
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4 Estimation strategies in the presence of a singularity 

To be more specific, we consider a periodized version of the wavelet basis on the unit interval 

Wmk, tpjk ■ j > m, k = 0, 1, . . . , 2 j - 1}, (4.1) 

where <p mk (x) = 2 m / 2 ip(2 m x - k), if) jk (x) = 2^ 2 if)(2 j x - k), x G [0,1]. Note that the latter 
requires that the resolution level m is high enough, in particular, m > mi, where m\ is such that 

2 mi > max (4-2) 

Here, C v * and C,^* are the lengths of supports of the mother and father wavelets, <p* and tp*, that 
generate periodized wavelet basis. Then, for any m > mi, the set (|4.1|) forms an orthonormal 
wavelet basis for L 2 ([0, 1]), and, hence, any / G L 2 ([0, 1]), can be expanded using formula (|2.5p . 
Under Assumptions (Al), (A2) and (A4), one can construct unbiased estimators of coefficients CL m k 
and bjk 

a m k = (y,t m ,k), b jk = {y,u jjk ). (4.3) 
If k G A"om and k G K°j, respectively, then estimators a mk and bjk have finite variances 

Var(a mfc ) x A" 2 fc , k G AT c m , Vax(b jk ) x AT 2 , fc G K^, (4.4) 

and have infinite variances otherwise. In order to account for the latter, for any m > mi, we 
partition / into the sum of singularity-affected and singularity-free parts 

f(x) = fo,m(x) + fc,m(x), X G [0, 1], 

where 

oo 

fo,m( X ) = ^2 

Q'mk'-Pmk 

+ b jk ^jk(x), s€[0,l], (4.5) 

keKo m j=m k£Kxj 

oo 

fc,m(.x) = ^ ~] Q"mk t Pmk 

( X ) + Y, Yl h o^jk{x), x G [0, 1]. (4.6) 

keKg m j=mkeK^ 

We then construct estimators /o, m and / C)Tn of /o,m and / c , m , respectively, and estimate / by a 
hybrid estimator 

fm(x) = h,m{x) + f c , m (x), X G [0, 1]. (4.7) 

In particular, we shall use a linear estimator with the resolution level m estimated from the data as 
/o m and a nonlinear block thresholding wavelet estimator as f C)Tn with the lowest resolution level 
m in / c>rn determined by the linear part fo,m- In what follows, we shall consider estimation of /o, m 
and / c>m separately. 

First, we construct a block thresholding wavelet estimator f C)Tri of f c ,m- For this purpose, we 
divide the wavelet coefficients at each resolution level into lj e blocks of length h^e" 1 ) to the left of 
(koj — D) and l^ £ blocks to the right of (fcoj + D), where 

1% = (k 0j - D)f He~ l ), lf £ = (2? -D-\- k 0j )/ He- 1 ), l j£ = max(lf e , lf £ ) . (4.8) 

Define blocks Uj} and Ujj of indices k to the left of (koj—D) and to the right of (koj+D), respectively, 
as 

Uji = [k : koj -D- lln^ 1 ) < k < koj — D — (I — 1) ln^ 1 )} , I G llf, (4.9) 
Ujf = {k : k 0j +D + ln^" 1 ) < k < k 0j + D + (I - 1) ln^" 1 )} , I G Uf, (4.10) 
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where 

Uf = {l: l<l<lj £ }, Uf = {l: l<l<lf £ ], U j = U^UU J R . (4.11) 

To simplify the narrative, we shall write / G Uj and k G Uji without a specific reference whether a 
block lies to the right or to the left of koj. Denote 

B JI = E b %> = E &?k, ( 4 - 12 ) 

^ fa " \D + (l-l)\n(e^r ^ V ( 5 

For any m > mi, estimate / cm by 

E E E W^i* ^ r 2 % £ )^fcOr), (4.14) 

where I(f2) is the indicator function of the set f2, the value of r will be defined later and 

2 J = e-^+l+2. (4.15) 

Now, consider estimation of the singularity-affected part. Since the estimators a mk of a mk , 
given in (14. 3h . have infinite variances when k G -fQ)m> we estimate those coefficients by solving a 
system of linear equations. Denote w m ^k = Q^mk and observe that, for a given m, m-i < m < J — 1, 
one has / = f m + R m ■ Here 

oo 2^-1 

/m = E a ™kPmk + E a ™.ktPmk, Rm = E E kjfc^jfc, ( 4 -16) 

and, hence, 

<5/ = E a mkWm,k+ E a mkW m ,k + QRra- (4.17) 

Taking scalar products of both sides of (|4.17p with u; mi z, / G i^om, obtain 

(Wm,l,Qf) = E a mk{Wm,l,W m ,k) + E a mk{Wm,hW m ^) + (w m ,l,QRm) , I £ K 0m . (4.18) 

Introduce matrices A( m ) and B^" 1 ) and vectors c^ m \ c^ m \ A m \ zS m \ h*" 1 ) and h( m ) with elements 

A ik? = ( w m,i,Wm,k), k,l G K 0m , = {w mi i,w m>k ), k G KQ m ,l G K 0m , (4.19) 

c{ m) = (w mtl ,Qf), ieK 0m , 4 m) = ( w m,i,y), l£K 0m , (4.20) 

r z (m) = (w„ h i,QR m ), I G -K"om, 4 m)=a mfc, k e K 0m , (4.21) 

h k m) = a mk, k£K^ m , tip = a mk = (y,t m>k ), keK£ m , (4.22) 

where a m fc, /c G i^omo are defined in (|4.3p . Then, one can re- write an exact system of linear equations 
(pTTHj) as c {m "> = A( m >z( m ) + B( m )h( m) + r(' m ) and obtain its approximate version 

e M = A (m)g(m) + B (m)£(m) _ ( 423 ) 
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Since matrix A^ m ^ is a nonnegative definite matrix of a finite size, in order to guarantee that it is 
nonsingular, it is sufficient to impose the following almost trivial assumption: 

(A5) Functions w m ^ = Q^Pmk, k £ Kom, are linearly independent. 

Under Assumption (A5), one has 

Z M _ (a^)^ 1 — B^h^ — r^" 1 ^ , = (A*™)) -1 (V" 1 ) — B^h^™^ . (4.24) 

Finally, for a given m, we set a m k = k S Ko m , and estimate /o, m by the following wavelet 

linear estimator 

fo,m{x) = ^2 arnkVmk(x), X £ [0, 1]. (4.25) 

Remark 3 (Relation to nonparametric regression estimation based on spatially inho- 
mogeneous data) We need to touch upon relationship between the present paper and the paper 
by Antoniadis, Pensky and Sapatinas (2012) which considered nonparametric regression estimation 
based on irregularly spaced data, in particular, in the case when design density has zeros. The latter 
problem is the well known formulation and has been studied extensively by many authors, including 
the case of the design density with zeros (see e.g., Gaiffas (2005, 2006, 2007, 2009)). On the other 
hand, the present paper introduces a completely novel notion of a spatially inhomogeneous linear 
inverse problem, discusses the best precision with which unknown function / can be recovered in 
the case when / itself is possibly spatially inhomogeneous and suggests estimation algorithm which 
allows to attain this precision in an adaptive fashion. 

The common ground between the two papers is that regression estimation with vanishing 
design density is indeed an example of a spatially inhomogeneous ill-posed problem and can be 
considered as a trivial case of deconvolution with spatially inhomogeneous design in Section [72] with 
Q being an identity operator. For this reason, the hybrid estimator was proposed in Antoniadis, 
Pensky and Sapatinas (2012), although, due to the fact that in the regression set up one observes 
function / directly, construction of the hybrid estimator is much more involved in the case of 
an inverse problem than in the case of nonparametric regression. In addition, the present paper 
provides the implementation of the hybrid estimator and studies its performance via simulations 
which has never been done previously since Antoniadis, Pensky and Sapatinas (2012) considered 
only theoretical construction of the hybrid estimator. 

5 Risks of the estimators of the singularity— free and the singularity- 
affected parts. 

In this section we shall provide asymptotic expressions for the risks of estimators (|4.14p and (|4.25p 
when the lowest resolution level m in both of them is a fixed, non-random quantity possibly depen- 
dent one: m = m{e). 

Let us first construct an asymptotic upper bound for the singularity-free portion (|4.14p of the 
estimator. Denote 

and observe that, under condition (12. 6p . there exist positive constants C\ a o and C\ a independent 
of m such that 

C Xa0 2 m ^ +max ( 1 < a )) m I ( a=1 ) < A" 2 < C Xa 2 m 3+max ( 1 > Q )) m^ a=1 \ (5.2) 
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Lemma 1 Let 1 < p, q < oo, s > max (1/2, 1/p), s* = min(s, s') and Assumptions (Al)-(A4 ) hold. 
Let f Ct7n be given by h4-M\ ) where the non-random quantity m = m(e) is such that m\ < m < J — 1, 
with J defined in {4.15^ . If 



2 a,-, ^ i fK -,\2 r 2 ^ 4(/3 + max(l,a)) 

' = 4C U C X (V2 X + 1 2 and X > -^r- 

2 + a + p 



(5.3) 



where C u and C\ are defined in 112.2]) and 12. 6)) . respectively, then, as e — > 0, 

sup E||/ C , m - / c , m || 2 < C (X~ 2 e + AfcJKe- 1 )]') • (5-4) 

.Here A(e) is defined in A3.3\) , 

= f I(2«(a - 1) = 03 + 1)(1 ~ 2/p)) + (1 T-gT P) % < 1) I(P < 2), if a^l, , , 

\ (2s* + /? + l)- 1 2s*, if o = l. 

ane? I is t/ie indicator function. Moreover, as e — > 0, 

sup E||/ C , m -/ Cim || 4 = o(e- 2 ) . (5.6) 

Now, we find upper bounds for the singularity-affected portion of the estimator fo,m(%)- Recall 
that w mt k = Qtfmk and let p m be such that 

CwiPm < \\w m ,k\\ < C w2 p m if k e K 0m . (5.7) 

Note that, since the set Ko m contains at most (2 Do) indices, p m satisfying condition (|5.7p can 
always be found. The advantage of using the system of equations (|4.23p rests upon the fact that 
matrix A^ m ^ is a finite dimensional positive definite matrix with all eigenvalues of order p^. In 
particular, 

||A( m )|| < C AlP 2 m , IKAM)- 1 !! < C A2 p~ 2 (5.8) 

for some positive constants Cai and Ca2 independent of m, as it is shown in the proof of the following 
lemma which states the rate of convergence of the singularity affected portion of the estimator. 

Lemma 2 Let 1 < p,q < oo and s > max(l/2, 1/p). Let Assumptions (Al)-(A5) hold and there 
exists C p x, < C p x < oo, independent of m, such that 

P 2 m > C pX X 2 m , (5.9) 

where X m and p m are defined in \5.1\) and (5.1), respectively. Let a > 1 and also 

Pm 4 max V K^ k {w m>h w mtk ) 2 < KxX^ (5.10) 

OO 2^-1 

, m ^ x \( w m,l,Vj,k}\ < K 2 p 2 m , (5.11) 

'E-KOm , ' 

j=m k=0 

for some absolute constants K\ and K 2 independent of m. If estimator fo <m of /o, m is given by 
$[4.25 ), then for any m, m\ < m < J — 1, and some constant C independent of m and e, as e — > 0, 
one has 

sup E||/ , m - / , m || 2 < C (eX^ + 2" w ) , (5.12) 

/e-B|, s (A) v ' 

and, moreover, 

sup E||/ , m -/o, m || 4 = o(e- 2 ). (5.13) 
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Note that in order f m = / c>m + /o jm estimates /, one needs to start the estimator / Cjm in 
(|4.14p at exactly the same resolution level at which the linear estimator /o,m i n (|4.25p is constructed. 
Thus, the choice of the lowest resolution level in (I4.14D is driven by the choice of m in (I4.25p . 

Let mo be such that 

2 m ° = (e [l n ( £ -i)]i(°=i)V 2s '+«+^ > (5.14) 

so that, for a > 1, one has 

e\- 2 + 2" W < CA(e) [ln^ 1 )]". (5.15) 
The following statement delivers the total squared risk of the estimator (|4.7p of / if DDq > 0. 

Theorem 2 Let 1 < p, q < oo and s > max(l/2, 1/p). Let conditions i5.9\ )- [5.11\) and Assump- 
tions (A1)-(A5) hold with a > 1 and DDq > in Assumptions (A2) and (A4). Consider estimator 
\4- of f where f c>m and /o, m are given by formulae {4- 14\ ) an d M-®5\) > respectively. Let m = niQ 
where tuq is defined in l[5.14\ ), J be defined in Ii4-15\ ) and let positive constants r and x satisfy 
condition \5.3}) . Then, for some constant C independent of e, as e — > 0, one has 

sup E\\f mo - ff < C A^Me- 1 )}^ (5.16) 

where A(e) and p are defined in \3. 3\) and 115.5]) , respectively. 

Validity of Theorem [2] follows directly from Lemmas [T] and [2] and inequality (15.15|) . 

Remark 4 (Optimality) Note that in (15.51) . p = unless 2s(a — 1) = (/3 + 1)(1 — 2/p) or a = 1 
or a < 1 and p < 2. The latter shows that the lower bounds for the risk in Theorem Q] cannot be 
made tighter, at least, in the case when a > 1. Theorems Q] and [2] and Corollary [T] demonstrate that 
the estimator (|4.7p attains the asymptotically optimal (in the minimax sense) convergence rates 
if 2s(a — 1) / ((3 + 1)(1 — 2/p) and a > 1 or if a < 1 and p > 2. Otherwise, estimator (I4.7|) is 
asymptotically near-optimal up to a logarithmic factor. 

Note that the value of mo depends on the unknown parameters s and p of the Besov space, 
hence, in general, estimator f mo in (|5,16p is not adaptive. However, if D = Do = 0, then /o i7n = 
/o, m = 0, / = / Cjm , and one can choose m = mi in / c>m using formula (|4,2p . so that the estimator 
is adaptive. In this case, convergence rates of / are given entirely by Lemma [TJ In particular, the 
following Corollary is valid. 

Corollary 1 Let D = Dq = and assumptions of Lemma U\ hold. Consider estimator f m = f C)Tn 
given by h4-14\ ) with m = m±. Then then, as e — > 0, 

sup E||/ mi - /|| 2 < CA(e) Me' 1 )]" (5-17) 
feB^ q (A) 

where A(e) and p are defined in \3. 3\) and l\5.5\) . respectively. 

If DDq > 0, then it is necessary to construct an adaptive estimator of /. Note that this is 
not an easy task. Expanding the system of equations in (|4.18p so that it includes not only the 
scaling but also the wavelet coefficients will compromise uniformity of eigenvalues of matrix A(' m ) 
(see (15 .8j) ) which are ensured by positive-definiteness and finite size of A^ m \ On the other hand, 
introducing a penalty on the solution does not help either since, for any m, the system involves the 
unknown bias term . For this reason, in order to choose parameter m, we apply Lepskii method 
since it allows to eliminate the bias inherent to the system of equations ()4.23p . 
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6 Adaptive estimation in the presence of singularity 



In order to construct an adaptive estimator of / in the presence of singularity, we shall use the 
technique of optimal tuning parameter selection pioneered by Lepski (1990, 1991) and further ex- 
ploited in Lepski and Spokoiny (1997) and Lepski et al. (1997). The idea behind this technique 
is to construct estimators for various values of the tuning parameter in question (m, in our case), 
and then choose an optimal value of the tuning parameter by regulating the differences between the 
estimators constructed with different values of the parameter. 

In particular, for various values of m, we construct versions of the system of equations, obtain 
values of z( m ) in (|4.24j) and use them as a m k = k G i^om,, in (|4.25p . Then, for various values 
of 77i, we obtain estimators f m of / using formula (I4.7P where /o, m and f CjJn are of the forms (14. 251) 
and (|4,14p . respectively, and m is the lowest resolution level of f c ,m- After that, we choose the "best 
possible" resolution level rh and consider estimator as the final estimator. The choice of the 
resolution level m is driven by the singularity-affected portion of / rather than the zero-free portion 
as it is described below. 

For any resolution level m > 0, we define a neighborhood Vt m of xq as 

a m = {x : min(L^ - D , L 4 , - D) < 2 rn (x - x ) < max(^ + D , + D)} (6.1) 

where suppip = (L v , U v ) and supp-^ = (L^, U^). Observe that tt m is designed so that supp(/o, m ) S 
Q m , supp(/ , m ) C Q m and fij C O m if j > m. 

Choose m = rh such that m\ < m < J — 1, where J is defined in (14.15P and 

rh = min {m : \\(f m - fj)I(n m )\\ 2 < k 2 e\&{eT x )\f for all j, m < j < J - l} , (6.2) 

where k > is a constant to be defined below. 

The construction of rh is based on the following idea. Note that when m = rh < mo, then one 

has 

E||4-/|| 2 <2[E||/ rfl -/ mo || 2 + E||/ mo -/|| 2 l . (6.3) 



The first component in (|6.3p is small due to definition of the resolution level rh while the second 
component is calculated at the optimal resolution level mo and, hence, tends to zero at the optimal 
convergence rate (up to a logarithmic factor). On the other hand, if m = rh > mo, then there exists 
j > m such that \\(f m — /j)I(^m) || 2 > e ln(e~ 1 )Aj 2 . The following Lemma shows that, provided 
k is large enough, the probability of this event is infinitesimally small. 

Lemma 3 Let mo and rh be given by expressions Hj5.14\ ) and respectively, and conditions of 

Theorem^ hold. Let C K = 2 11 DqC\ 2 max (CpA 1 C^, 2 > CtK\) where constants Ca2, C p \, C W 2, Ct and 
K\ are defined in \5.3\) . 115. 9\) . ( [5. 7| ), \2.1$ and \5. 10\) . respectively. If 

K>™*{d 2 C K) 2\d 2 + l)C t ) , x 2 > d 2 + 2/(2 + a + /3), (6.4) 

then, as e — > 0, 

P(m > m ) < Ce d \ (6.5) 

Lemma [3] confirms that indeed m = m can be chosen as the lowest resolution level in the 
nonlinear portion of the estimator, so that we estimate / by 

/>) = foM x ) + fcMx), x e [0, 1], (6.6) 

where fo,m{x) and f c ,m(x) are defined in (I4.25P and (I4.14p . respectively. The following statement 
confirms that the wavelet nonlinear estimator / given by (16.6P indeed attains (up to a logarithmic 
factor) the asymptotic minimax lower bounds obtained in Theorem [TJ 
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Theorem 3 Let 1 < p, q < oo and s > max(l/2, 1/p). Let conditions t5.9\ )- [5.11\) and Assump- 
tions (Al)-(A5) hold with a > 1 and DDq > in Assumptions (A2) and (A4). Consider estimator 
{4- of f where f CiTn and /o,m o,re given by formulae {4- 14\ ) an d &4-%5\ )> respectively. Let in = m 
where rh is defined in \6.2\) . Let J be defined in \4- 15\ ) and k and x be such that 

k >max(4C K ,320Ct), X 2 > 4 + 2/(2 + a + p), (6.7) 

where C K and Ct are defined in Lemma [3] and formula \2.J$ , respectively. Then, for some constant 
C independent of e, as e — > 0, one has 

sup E||4 - /|| 2 < CA(e) pnCe" 1 )] 1 ^ 1 ) (6.8) 
/eB-„(A) 

where A(e) is defined in \3. 3\) . 

Remark 5 (Logarithmic factor in convergence rates) Observe that in Theorem [2] convergence 
rates are sharp if a > 1 or p > 2 unless 2s(a — 1) = (/? + 1)(1 — 2/p). However, in Theorem [3] the 
risk of the adaptive estimator is always within a logarithmic factor [ln(e~ 1 )] 1+I ( Q=1 ) of the minimax 
risk. The latter is due to application of Lepskii method. Note that in spite of the fact that we are 
using mean squared error, Lepskii method is applied locally and hence leads to an extra log-factor 
in the risk, as it usually happens with application of Lepskii method to pointwise estimation. 

7 Examples 

7.1 Deconvolution with a spatially inhomogeneous kernel 

Consider problem (jl.ip with operator Q of the form 

(Qf)(x)= n(x) ! q(x-t)f(t)dt, xe[0,l], (7.1) 
J 

where functions fJ>(x), q(x) and f(x) in the right-hand side of equation ()T. 1|) are periodic and both 
q(x) and fJ,(x) are completely known. In this case, problem (jl.ip is known to be equivalent to the 
following statistical problem 

yi = u(i/n) q(i/n-t)f(t)dt + i = l,---,n, (7.2) 
J o 

where is a white Gaussian noise and e = a 2 /n. Equation of the form (|7.ip can appear when 
one observes a convolution Y(x) of the known kernel q with the unknown function of interest / 
and a known heteroscedastic noise y/e^{x) W(x), so that n(x) = [y{x)]~ l and Y(x) = j(x)y(x) = 
y{x)/ n{x). In this case, equation (17. 2h takes the form 

Yi= f q(i/n-t)f(t)dt + aj{i/n)d, i = l,---,n. (7.3) 
J o 

If fi(x) is uniformly bounded above and below, in principle, spatial inhomogeneity of operator 
Q in (|7.ip can be ignored. Below we consider the case when the former is not true since /x(x) 
vanishes at some point xq € (0, 1), in particular, 

C^\x - x \ a < fi 2 (x) < C^ 2 \x - x \ a (7.4) 

for some for some positive constants a, C^i and independent of xq and x. Therefore, the version 
of the problem studied in the present paper can be described as locally extreme noise which arises 
when the degree of spatial inhomogeneity is high. 
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Direct calculations show that 



(Q*h)(z) = / q(x — z)h(x) fi(x)dx. 
Jo 

Hence, Uj >k and t m ^ k are solutions of the equations Q*uj jk = ipj k and Q*t mk = <f mk - Consider 
uj E Z and let e u (t) = e l27TUjt , t G [0,1], be the periodic Fourier basis on [0,1]. Denote Uj k (x) = 
Uj : k(x)fJ-(x) and, similarly, T mk (x) = t m ^{x)^{x) and introduce Fourier coefficients q u = (q, e w ), 
Ujkco = (Ujk,eu), T mku) = (T m k,eu), il>j ku) = (fpjk,^) and tp mkuJ = (</?mfc,e w ). Then, U jkuj = 
[q~u]~ l Tpjkw, T rnku] = \qu\~ 1 (fimku and 

Ujk(x) = ^[(L]" 1 ^jh^{x), T mk (x) = ^[gJ'VmfcAW, (7-5) 

where q u is the complex conjugate of q u . Moreover, estimators a mk and bj k in (|4.3|) can be 
constructed using Fourier wavelet transform suggested in Johnstone et al. (2004). Indeed, if 
Y(x) = y(x)/fi(x) and are Fourier coefficients of function Y(x), then 

bjk = {y,u j:k ) = (Y,U jk ) = ^[^]~ 1 ipjkuY u , a mk = S^gu]" 1 <Pmku>Y u . (7.6) 

In the case of the statistical experiment described in formula (|7.2p . Fourier coefficients are replaced 
by the discrete Fourier transform. 

Note that application of the wavelet- vaguellete methodology to deconvolution with heteroscedas- 
tic noise (|T.3f) is very reasonable. Really, if noise level 7(2;) is such that f j 2 (x)dx < 00, then formula 
(|7.6p implies that wavelet coefficients are estimated using Fourier transform of the measured signal 
Y in (]7.3p and then thresholded taking into account the local noise level. Indeed, it is easy to 
observe that \j tk in (|7.10p is such that X~ 2 >c ||fjfc|| 2 / j 2 (x)I(x G supp Uj k )dx. If J r y 2 (x)dx = 00 
in the vicinity of some point xo, the natural strategy suggested above cease working and one needs 
another means for estimating scaling and wavelet coefficients in the neighborhood of Xq. This is the 
situation when one has to apply the hybrid estimator constructed in Sections [5] and [6l In Section [8] 
we descrtibe in detail how this task can be accomplished. If function /j,(x) is not completely known 
and is estimated from data, matrices A^" 1 ) and B( m ) as well as vector c( m ) will be subjected to 
additional errors which have to be accounted for by using regularization techniques designed for 
the inverse problems with errors in the operator (see, e.g., Engl, Hanke and Neubauer (2000) and 
Hoffmann and Reiss (2008)). 

In order to find Xj >k x || u j,fe|| m Assumption (Al) and verify Assumptions (Al)-(A5), we 
impose the following conditions on the kernel q and mother and father wavelets tp and (p. 

(El) Kernel q{x) is (r — 2) times continuously differentiable on [0, 1] and 77 > r > 1 times 
differentiable outside the neighborhood of jump discontinuities of q( r ~ 1 ^ with q^ and q( ri ^ uniformly 
bounded. The value r = 1 corresponds to the case when q itself has jump discontinuities. 

(E2) Fourier coefficients q u of q are such that C q i(|w| + l) _r < \q^\ < C 9 2(M + l) _r for 
some positive constants C q \ and C q 2 independent of uj. 

(E3) Let V be ro-regular, ro times continuously differentiable wavelet function with the 
bounded support, where ro > max(r, 77). 

(E4) Kernel q is such that functions Uj k and Tj k defined in in (|7.5p have bounded supports 
of the lengths proportional to 2~ J and centered at 2~ 3 k: supp(Uj k ) = {2~ 3 (k — du),2~ J (k + djj)) 
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and supp(T j/c ) = (2~i(k - d T ),2^(k + d T )) . 

There are many functions q satisfying conditions above, among them, for example, 

gi(x) = exp(— A|rc + fc|) and q2(x) = exp(— A(x + k))(x + k) N , (7-7) 

k£Z k>0 

with r = 2 for and r = N + 1 for q2(x). 

Note that under Assumptions (E1)-(E4), one has Hu^/dl 2 = / \x)U 2 k {x)dx = 00 if \k — 
koj\ < du and a > 1 and 

||n i)fe || 2 x ^ 2 (2-^)||C/, fc || 2 x|2-^-x |- Q ^|^r 2 |^| 2 (7.8) 

< c2^p-fc 0J r + i]" 1 ^(M 2r + i)i^u 2 , 

otherwise. Due to conditions (E3) and (E4) and periodicity of ipjk, integrating by parts r times, we 
obtain the following expression for Fourier coefficients of ipj^ 

i^jkco = C 2H 2 4>{2ix - k)e i27TU)X dx = 2^ r+1 ^ {-2muj) r [ ^(2?x - k)j 2 ™ x dx, 
Jo Jo 

so that 
Therefore, 

\\uj, k \\ 2 <C2 j ( a+2r ^[\k-k 0j \ a + ^ \j 2 k , if \k- k 0j \ > d v or a < 1, (7.9) 

and 1 1 it j fell = 00 otherwise. Similar inequality can be proved for ||t mi fc|| 2 - 

Now, we need to show that indeed Xj^ is defined by expression (|7.9p and that, under conditions 
(E1)-(E4), Assumptions (Al)-(A5) hold. This is accomplished by the following proposition. 

Proposition 1 Let fi(x) satisfy condition |7,^[ ), Then, under Assumptions (El)- (E4) with r\ 
such that 2r\ + 1 > 2r + a, one has 

X\ k x 2~^ 2r+a \\k - k 0j \ a + 1]. (7.10) 

Furthermore, Assumptions (Al)-(A5) hold with D = Dq = if a < 1, and with D = djj and 
Dq = d? if ot>\. Conditions t5.9\ )- [5.11\) of Lemma\^ are also valid. 

Due to Proposition [H all statements and constructions in Sections [3HS] can be applied to 
equation (jl.ip with operator Q given in ()7. 1|) . In particular, Theorems [THS] can be utilized with 
P = 2r. 

By direct comparison with, e.g., Johnstone et al. (2004), one can see that if a = 0, so that 
the problem is spatially homogeneous, then the rates of convergence in Theorems [THS] coincide with 
the usual convergence rates exhibited in deconvolution problems with white noise. 
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7.2 Deconvolution with spatially inhomogeneous design 

Consider the problem of deconvolution when measurements are irregularly spaced. In particular, 
let g be a sampling pdf with the corresponding cdf G. Due to irregular design, operator Q can be 
presented as 



(Qf)(x)= [ q(G~ 1 (x) - t)f(t)dt, (7.11) 
Jo 



where where G 1 is the inverse of G. In this case, equation (|1.1|) can be viewed as an idealized 
version of the equation 



[ q( Xi -t)f(t)dt + a^, i = l,...,n, (7.12) 
Jo 

where e = o~ 2 /n, £j is a white Gaussian noise and observation points xi, i = 1, • • • , n, are such 
that G(i/n) = X{ and Xj's and £j's are independent. Then, G _1 (z/n) = Xi, i = 1, ■ ■ ■ ,n, and the 
right-hand side of (|1.1[) with operator Q given by (|7.1ip provides a continuous equivalent of the 
statistical problem (|7.12|) . 

In what follows, we assume that functions q{x) and f{x) are periodic and both q{x) and g{x) 
are completely known. In this case, the conjugate operator Q* is of the form 

(Q*h)(z)=[ q(G~ 1 (x) - z)h(x)dx. 



■Jo 

It is pretty straightforward to show that Uj t k{x) = Ujk(G~ 1 (x)) and t m ^{x) = T m k{G~ 1 (x)) where, 
as before, Ujk(-) and T m k(-) are given by formula (|7.5p . Wavelet coefficients bjk and a m k can also 
be estimated in a manner similar to Example 1. Indeed, if Y(x) = y{G{x)) and Y u are Fourier 
coefficients of Y , then bjk and a m k can be evaluated using formula <\7M . 

If design density g is unknown, then both g{x) and G{x) have to be estimated from obser- 
vations Xi, i = 1, • • • ,n. The latter will lead to additional errors in estimating wavelet coefficients 
bjk and d m /% as well as entries of matrices A( m ) and B( m ) and vector c( m ). The issue of additional 
errors has to be addressed by using, for example, regularization techniques (see e.g. Engl, Hanke 
and Neubauer (2000) and Hoffmann and Reiss (2008)). 

We assume that design density g(x) has a single zero of order a at xo, i.e., g(xo+x)\x\~ a — > C g 
as x — > 0. The latter implies that there exist some absolute constants C g \ and C g 2 such that, for 
any x, one has 

C g i\x - x \ a < g(x) < C g2 \x - x \ a . (7.13) 

Thus, we are considering the case of extremely inhomogeneous design which can be described also 
as a local data loss. 

In this case, under conditions (E1)-(E4), similarly to Example 1, one has ||itj,fc|| 2 = f g~ 1 (x)Uj k (x)dx. 
Hence, identically to (|7.9p . one has 

IKfc|| 2 < C2 j ( a+2r 1\k-k 0j \- a , if \k- k 0j \ > d v or a < 1, 
ll^fcll 2 = °°) if 1^ — koj\ < du and a > 1. 

Moreover, by simple modifications of the proof of Proposition [H it easy to show that the following 
statement is valid. 



Proposition 2 Let g(x) satisfy condition (7.13). Then, under Assumptions (El)- (E4) with r\ 
such that 2r\ + 1 > 2r + a, the value of Xj^ is given by formula \7.10\ j. Furthermore, Assumptions 
(Al)-(A5) hold with D = Dq = if a < 1, and with D = du and Dq = dj- if a > 1. Conditions 
i f 5. 9\) - (5.1 1\) of Lemma\^are also valid. 



18 



Again, analogously to Section 17-H due to Proposition [2j all statements and constructions 
in Sections [HHS1 can be applied to equation (jl.lj) with operator Q given in (|7.1ip . In particular, 
Theorems HH3] can be used with (3 = 2r. 

If q is the Dirac delta function, so that (Qf)(x) = f(G~ 1 (x)), then r = and the problem 
reduces to regression estimator based on spatially inhomogeneous data studied in Antoniadis, Pensky 
and Sapatinas (2012). In this case, the rates of convergence coincide with the minimax convergence 
rates derived therein. 

Remark 6 (Irregularly spaced observations and heterogeneous noise) It follows from ex- 
amples in Sections 17.11 and 17.21 that there is a direct correspondence between deconvolution with 
irregularly spaced measurements and deconvolution with heterogeneous noise. In particular, as far 
as convergence rates are concerned, the squared noise level acts in a similar way to the inverse of 
the design density and both are equivalent in some way to a multiplicative factor in the convolution 
operator. 



8 Simulation study and real data application 

8.1 Simulation study 




Figure 8.1: True function H (red line) and observed data (blue line) with a = 1 (upper left), 
a = 2 (upper right), a = 2.5 (lower left) and a = 3 (lower right). Here, SNR = 0.8848 for a = 1, 
SNR = 0.0808 for a = 2, SNR = 0.0183 for a = 2.5 and SNR = 0.0040 for a = 3. 

In order to assess finite sample properties of the proposed methodology and, in particular, 
performance of the hybrid estimator, we carried out a small simulation study. We limited our 
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Figure 8.2: Thresholded wavelet-vaguelette deconvolution estimator with n = 1024. True regression 
/ (red line) and its estimated value (blue line) with a = (upper left), a = 1 (upper right), a = 2 
(lower left) and a = 3 (lower right). 

attention to the deconvolution in the presence of heteroscedastic noise described in Section 17.11 
Specifically, we considered q(x) = qi(x) with A = 5 where qi(x) is defined in (|7.7p . We chose one 
of the standard test functions, blip, as the true function f(x). Function fi(x) in ()7.ip is of the 
form fi(x) = \h~ 1 (x — xo)\ a / 2 I(\x — xq\ < h) + l(\x — xq\ > h) with xq = 1/3 and h = 1/6, so that 
condition (I7.4p holds. We generated data using equation (17.31) with j(x) = /j.~ 1 (x) and a = 0.02, in 
particular, 

Yi = H(i/n) + aj(i/n)£i, i = l,---,n, with H(x) = [ q(x - t)f(t)dt. 

Jo 

We evaluate noise intensity by the, common in signal processing, signal-to-noise ratio (SNR) which 
is defined as SNR = \fn std(/)/(||7|| * a) where ||7|| is the L 2 -norm of 7 and std(p) is the standard 
deviation of p for any function p(x). 

We used WaveLab package for Matlab and carried out simulations using degree 8 Daubechies 
wavelets and n = 1024. In order to obtain estimators of wavelet and scaling coefficients, we gen- 
erated wavelet and scaling functions ipj^ and tp m k using MakeWavelet command and obtained a 
respective matrix of the Fourier coefficients. Subsequently, we found estimators of wavelet and scal- 
ing coefficients using formula (|7.6p with Y u being discrete Fourier transform of vector Y in (|7.3p . We 
generated values of using equation (|7.8p and used them for hard thresholding. Due to relatively 
small value of n, we did not use block thresholding described in Section HI By applying inverse 
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Figure 8.3: Hybrid estimation with a = 2.5. True H (red line) and observed data (blue line) 
(upper left), true / (red line) and wavelet- vaguelette estimator (green line) (upper middle), true / 
(red line) and hybrid estimators (blue line) with m = 2 (upper right), m = 3 (lower left), m = 4 
(lower middle) and m = 5 (lower right). Lepskii method selects estimator with m = 4 (lower 
middle). 

wavelet transform to the thresholded wavelet coefficients, we obtained deconvolution estimator /. 

We evaluated performance of the estimators for n = 1024 and different values of a. As 
it is expected, when a is growing, the SNRs are decreasing and the quality of observed data is 
declining. Figure 18.11 demonstrates observed data for various values of a. The corresponding 
signal-to-noise ratios are SNR = 0.8848 for a = 1, SNR = 0.0808 for a = 2, SNR = 0.0183 
for a = 2.5 and SNR = 0.0040 for a = 3. Figure 18.21 shows wavelet-vaguelette deconvolution 
estimators (j4. 14j) obtained for a = 0, a = 1, a = 2 and a = 3. As the values of a grow, SNR 
declines and the wavelet-vaguelette estimators (|4.14p deteriorate. If a = 3, the wavelet-vaguelette 
reconstruction has little resemblance to the regression function which it estimates. Note that for 
moderate values of a, the wavelet-vaguelette estimator adjusts to spatially inhomogeneous noise 
quite well. Indeed, fluctuations at the right end of the graph appear even when a = (upper 
left) and are due to the relatively crude choice of threshold in formula (|4.14p . Actually, for a = 
the noise cease to be inhomogeneous and estmator (|4.14h reduces to Fourier-wavelet estimator of 
Johnstone, Kerkyacharian, Picard and Raimondo (2004). 

For large values of a, we construct hybrid estimators described in Sections H] and [6J Construc- 
tion of the adaptive hybrid estimator consists of the following steps. 

1. Fix the lowest resolution level mi and the highest resolution level J. For each value of 
m = mi, • • • , J — 1, repeat steps 2 through 6. 

2. Obtain the wavelet-vaguelette estimator of / with the lowest resolution level m using formula 

3. Identify sets Ko m , K^ m , K\j and K\- for j — in • • • J — 1. Also, find set 



21 




Figure 8.4: Hybrid estimation with a = 3. True H (red line) and observed data (blue line) (upper 
left), true / (red line) and wavelet- vaguelette estimator (green line) (upper middle), true / (red 
line) and hybrid estimators (blue line) with m = 2 (upper right), m = 3 (lower left), m = 4 (lower 
middle) and m = 5 (lower right). Lepskii method selects estimator with m = 4 (lower middle). 

4. Form matrices A^" 1 ) and E$( m ) and vector c^" 1 ) using formulae (14.19P and (14.20p . respectively, 
and obtain solution z( m ) of the system of equations f)4.23j) . Use vector z( m ) as coefficients a m k, 
k S Ko m , in the zero-affected portion of the estimator (|4.25p . 

5. Replace estimators of the scaling coefficients (if k S K$ m ) and wavelet coefficients (if k £ K\j, 
j = m, ■ • ■ , J — 1) by zeros to obtain the zero-free portion of the estimator (j4.14j) . 

6. Combine wavelet coefficients in steps 4 and 5 to obtain wavelet coefficients of f m . Recover f m 
using inverse wavelet transform. Set 



K? = ° 2 J>- 2 (2-™A;) \\T, 



2 

mk\\ • 




7. For each m = mi, • • • , J — 1, and j = m + 1, • • • , J — 1, evaluate matrix of the adjusted 
differences 



C mj = \\(f m ~ /i)I(^m.)|| 2 / (^n" 1 logn \J 

Choose rh in (|6.2p as 

rh = min {m : C m j < k 2 for all j, m < j < J — l} , (8-2) 
i.e., by comparing maximum value of row m of matrix C with a constant k 2 . 

In our simulations, we chose n = 1024, mi = 1 and J = 7 and carried out hybrid estimation 
with a = 2.5, a = 3 and a = 4. Simulation results for these three cases are presented in Figures [8.31 
18.41 and l8. 51 respectively. The upper left figures represent the observed noisy data, the upper middle 
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figures exhibit reconstructions of / by wavelet-vaguelette estimator while the rest of the figures 
display hybrid estimators of / for resolution levels m = 2 to m = 5. Observe that for a = 2.5 
the wavelet-vaguelette estimator still generally follows the true function / but for a = 3 or a = 4 
it bears little resemblance to /. The hybrid estimator allows to account for inhomogeneity of the 
noise and to significantly improve reconstruction of /. 

Lepskii procedure provides a choice of resolution level rh for each of the values of a. In the 
case of a = 2.5, the row maximum exceeds 93 for m < 3 and is below 0.2 if m > 4. In the case of 
a = 3, the row maximum exceeds 20 for m < 3 and is below 0.3 if m > 4. In both of these cases, 
Lepskii method chooses rh = 4. If a = 4, the row maximum is very large for m < 2 and is below 3 
for m > 3, so that rh = 3. Figures 18.31 18.41 and 18.51 confirm that Lepskii procedure makes correct 
choices. 




Figure 8.5: Hybrid estimation with a = 4. True H (red line) and observed data (blue line) (upper 
left), true / (red line) and wavelet-vaguelette estimator (green line) (upper middle), true / (red 
line) and hybrid estimators (blue line) with m = 2 (upper right), m = 3 (lower left), m = 4 (lower 
middle) and m = 5 (lower right). Lepskii method selects estimator with rh = 3 (lower left). 

8.2 Real data application 

Below, we consider application of the hybrid estimator developed in the paper to recovery of a 
convolution signal transmitted via Amplitude Modulation described in Example El Mathematically, 
the problem reduces to deconvolution with a spatially inhomogeneous kernel in Section 17.11 and 
appears in the form of equation (j7.2j) with 

/i(x) = cos(2itujx + 9) (8-3) 

with u ~ n/2 and 9 G [0; 2ir]. We chose u = re/2 + 1 and expressed 9 as 9 = 3ir/2 — 2tt9q. Then, 
n(x) can be presented as n{x) = sin(27r(n/2 + x — 9q)), so that fJ,(x) has two zeros of order a = 2, 
xqx and x 2, in [0,1]. 
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For simplicity, we considered the same set up as in simulation example, that is, we used 
q{x) = q\{x) with A = 5 where q\{x) is defined in (|7.7|) and one of the standard test functions, blip, 
as the true function f(x). We carried out simulations with n = 512, 6q = 1/3 (so that, xqi = 1/3 
and X02 = 5/6), <r = 0.01 and degree 8 Daubechies wavelets. The locations of zeros were estimated 
from the data. One of zeros was placed at the position where the value of the original signal is 
minimal in absolute value, and another zero was placed 1/2 units away from the first zero. In our 
study, locations of zeros were estimated as xoi = 0.33496 and £02 = 0.83496. 

The top row of Figure 18.61 presents signal y with uniform noise generated according to equa- 
tion (|7.2p as well as signals Y with heteroscedastic noise obtained according to equation (|7.3p by 
dividing equation (|7.2p by n(i/n). The bottom row of 18.61 shows the wavelet-vaguelette deconvolu- 
tion estimator and the hybrid estimators. It is easy to see that the wavelet-vaguelette deconvolution 
estimator delivers a poor reconstruction while the hybrid techniqur produces a much more precise 
estimator of the unknown signal /. 




Figure 8.6: Observed values of the signal and estimators of the true function. Top left: true signal 
(red line) and observed data with homogeneous noise (blue line). Top right: true signal divided by 
H (red line) and observed data with heteroscedastic noise (blue line). Bottom left: true function 
/ (red line) and wavelet-vaguelette estimator (blue line). Bottom right: true function / (red line) 
and hybrid estimator (blue line). Here, 8q = 1/3, noise level a = 0.01. 



9 Discussion 

In the present paper, we consider estimation of a solution of a spatially inhomogeneous linear inverse 
problem (|l.ip with possible singularities. The special feature of problems like this is that the degree 
of ill-posedness depends not only on the scale but also on location. In spite of a huge number of 
publications devoted to linear inverse problems, to the best of our knowledge, this type of problems 
has never been treated before. Spatially inhomogeneous ill-posed problems appear naturally when 
the noise level is location dependent or observations are irregularly spaced. We consider a version 
of a spatially inhomogeneous problem where there exists a singularity point xq such that the norm 
of the solution grows when the right-hand side is localized in the vicinity of xq. This assumption 
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corresponds to the situation of locally extreme noise and extremely inhomogeneous design. We 
also assume that the unknown function / belongs to a Besov space and characterize ill-posedness 
and spatial inhomogeneity of operator Q in terms of wavelet-vaguelette decomposition. The novel 
feature here is that the norms of vagueletts depend on location and may be infinite in the vicinity 
of a singularity point, so that SVD-type solutions cease to work. 

For this reason, estimators obtained in the paper are based either on wavelet-vaguelette de- 
composition (if the norms of all vaguelettes are finite) or on a hybrid of wavelet-vaguelette decompo- 
sition and Galerkin method (if vaguelettes in the neighborhood of the singularity point have infinite 
norms). The hybrid estimator is a combination of a linear part in the vicinity of the singularity 
point and the nonlinear block thresholding wavelet estimator elsewhere. To attain adaptivity, we 
first choose an optimal resolution level for the linear, singularity affected, portion of the estimator 
using Lepskii (1990, 1999) method and then use this resolution level as the lowest for nonlinear 
wavelet estimator. We show that, up to a logarithmic factor, the hybrid estimator attains the 
asymptotically optimal (in the minimax sense) convergence rates. 

The theory presented in the paper is supplemented by examples of deconvolution with a 
spatially inhomogeneous kernel, deconvolution in the presence of locally extreme noise or extremely 
inhomogeneous design. The first two problems are examined via a limited simulation study which 
demonstrates advantages of the hybrid estimator when the degree of spatial inhomogeneity is high. 
In addition, we apply the technique to recovery of a convolution signal transmitted via amplitude 
modulation. 

We note that the wavelet-based estimation procedure presented in the paper is motivated 
by the need of constructing an asymptotically optimal estimator in the case when the unknown 
function / is spatially inhomogeneous. The estimator uses relatively crude thresholding procedure 
which can be improved by applying more sensitive thresholding techniques. Moreover, one can 
possibly find more efficient computational procedures than the hybrid estimator if establishing 
asymptotic optimality is not a priority. 

Finally, in the paper, we consider only the simplest case when the unknown function is univari- 
ate and is defined on an interval. The problem can be naturally extended to the case of multivariate 
function / which belongs to an isotropic or anisotropic Besov space. The paper assumes that the 
operator Q in (II. ip is completely known. However, if this is not true, it will be interesting to investi- 
gate how uncertainty about Q affects the rates of convergence. Also, although the hybrid estimator 
works adequately when Q is completely known, it would require appropriate modifications if Q is 
partially unknown. However, all these extensions will be a matter of future investigation. 
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10.1 Lower bounds 

Proof of Theorem [TJ The rates are derived by standard methods described in, e.g., Tsybakov 
(2009). For this reason, we shall provide a very brief proof. 

The main idea of the proof is based on Lemma A.l of Bunea, Tsybakov and Wegkamp (2007). 
In order to show that, for some C and H > 0, 



10 Proofs 



R E (B s p>q (A)) > CH, 
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one needs to find a subset of functions T C Bp J A) such that for any pair f\, /jg F, 

ll/i-MI 2 >4H (10.2) 

and the Kullback-Leibler divergence 

K(P /l ,P /2 ) = 0.5e _1 ||qi - q 2 || 2 / 2 < ln card(J")/16. (10.3) 

We consider two cases here: the strongly inhomogeneous and the weakly inhomogeneous cases. In 
the strongly inhomogeneous case, the hardest set to estimate is the finite set of functions which are 
concentrated around singularity point. In the the weakly inhomogeneous case, this set is comprised 
of functions which are uniformly distributed over some resolution level. 



The strongly inhomogeneous case Consider a set of functions T = {"fjipjk '■ \k— feojl < K} 
where K is a fixed positive constant. Then, card(J-") > 2K — 1. In order / G Bp q (A), one needs 
7j < A2~i s , so set 7, = A2~^ s . It is easy to check that for /j = r Yjipjk i , £ = 1,2, one has 



|qi - 02 1 



l]\\v jhl - v jk2 \\ < (7 7 |(A| fel + X% 2 ) x 2-^) 7 J, 



by P^jl . since |/e-fe j| < X . Hence, it follows from (fT0~3l) that j is such that 2 j x (A 2 /e) 1 /( 2s '+a+/ 3 ). 
Note that ||/i - / 2 || 2 = 2 7 2 , so that if = 0.5 A 2 2~^ s ' and, therefore, 

2(a+p) 2s' 

R e (Bp q (A)) > CA^+^+p e*°'+<*+p. (10.4) 
The weakly inhomogeneous case Consider a set £1 of binary sequences of length N = 2 3 , 

J>3: 

$7* = {w = (wo, 
and a corresponding set of functions 



,W7V-l), w< = {0, 1}} = {0, 1} 



N-l 



J 7 * = I f u = Jj ^2 ^k^jk, wGO . 



fc=0 



Let p(oj,oj') = Ylk^( u k u' k ). Then, for any G one has \\f u — A/|| 2 = 7 2 p(uj,uj'). 

By Varshamov-Gilbert Lemma (see Lemma 2.9 of Tsybakov (2009), there exists a subset O = 
• • ,w (M) ) of n* such that M > 2 N / 8 and p(w, w') > N/8 for any w, J G fi. 
Consider subset J 7 = {/ w G J 7 * : w G O}. Then, for any f u , f u > G J 7 , one has 



H/u - U II 2 > 2 J 7 2 /64, card(^) > In 2 2>'/8. 



10.5) 



Since f w G £^(^4), we set 7j = y42-^ s+1 / 2 ). Now it remains to determine relationship between j 
and e. For this purpose, note that, by fj2.3j) and (|2.6|) . 



\QU - QL 



7] 



2 J -1 



^ ^'k) v j,k 



k=0 



2^-1 



< 



2 



fc=0 



< C2 j(a+/3) 2 i(a+1) 7 2 . 



Hence, relations (110. 3ft and the second inequality in (j!0.5j) imply that 2 J >c ( J 4 2 /e) 1 /( 2s+/3+1 ). There- 
fore, by (|10.1|) and the first inequality in (|10.5|) . one has 



2(<3 + l) 2s 

R E (Bi J A)) > CA^+e+^ 



10.6) 



Now, to complete the proof of Theorem Q~] note that the lower bound given by formula (|10.6p 
dominates the one given by (|10.4p if 2s{a — 1) > (/3 + 1)(1 — 2/p) and visa versa. 
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10.2 Supplementary large deviation results 
Lemma 4 Let r 2 = C u C x (V2x + l f ■ Th en 



(bjk ~ b jk ) 2 > t$R j1£ < e* 2 



(10.7) 



where Rji e is defined in 13\ ). 

Proof of Lemma [4l Consider the set of vectors 

Mji = I "k, k G Uji : ^2 \u k \ 2 < 1 

and the centered Gaussian process defined by 

keUji 

The proof of the lemma is based on the following inequality: 

Lemma 5 (Cirelson, Ibragimov &; Sudakov (1976)). Let D be a subset of E = (—00,00), 
and let (l;t)t£D be a centered Gaussian process. If E(sup teD £t) < B\ and sup te£) Var(^ t ) < B2, 
then, for all x > 0, we have 



(sup& > x + Bi) < exp ( - x 2 /(2B 2 ) 
V tgD 



(10.8) 



To apply Lemma EJ note that 



sup Zji(u) 



Hence, by Jensen's inequality, we derive that 

Bi < 

Also, by assumption (A2), 



^2 \ h i k ~ M 2 



keu n 



1/2 



C « e E X J,t 



1/2 



< \J C U C\ Rji e . 



B2 = sup Yai(Zji{y)) = e sup 



feet/,-, 



< eC u max A ■ fc < C U C\ Rji £ . 
keUji J > 



Therefore, by applying Lemma [5] with x 2 = 2y 2 C u C\Rji £ and noting that, therefore, x + B\ = tq, 
we obtain fllQ.Tf) . 
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Lemma 6 Let niQ and rh be given by formulae |5, tJ$ and h6.2\) . respectively, and be the 

solution of the system of equations given by {^.2$ . Let assumptions of Lemma hold. Denote 
rime = Kn v ' £ m (£ -1 )- If in > tuq, then, for any v > 0, as e — > 0, 



|z(™)_ z (™)|| >ur hne 



where 



C v = 32D 2 C 2 A2 max(C; A 1 C 2 2 , C t K x \ 



(10.9) 



;io.io) 



and constants Ca2, C p \, C w2 , Ct and K\ are defined in 115. 8\) , 115. 9\) . J<5,7| ), \2.J$ and A5.1(J\) , 
respectively. 

Proof of Lemma [6j Observe that for any m, by (|4.24j) . one has 

||g(m) _ Z M|| < ||(A( TO )) -1 (c (m) -c( m ))|| + ||(A (m) ) _1 B (m) (h (m) -hM)|| + ||(A^) -1 r( m )||. 

Recall that, by ([UnO]) and QHI31), inequality ||(A( m )) -1 r( m )|| 2 < 2DC 2 A2 A 2 K 2 2~ 2ms ' holds, and 
observe that, for m > mo, ||(A(' m )) _1 r( m ) || = o(r] me ) as e — )• 0. Therefore, as e — )• 0, 

P (|| Z M - z ( m ')|| > 1/7^) < P (||(A( m ))- 1 (c^ - c( m ))|| > 0.51/7^(1 - o(l))) 
+ p(||(A( m ))- 1 B( m )(h( m ) -h( m ))|| > 0.51/7/^(1- o(l))) =Pi + P 2 . 



Here, since c( m ) — c^" 1 ) is a (2Do)-dimensional normal vector with zero mean and the component 
variances Var(c[ m) ) = e||m mi ;|| 2 , using formulae (|5.7|) . (|5.9p and (|5.8p . one derives, as e — > 0, 



Pl < p ||6("0_ C W || > 0.5^^^(1-0(1)) 



:io.n) 



< 2L>n max 



(jcf m) - 4 m) \ > {±C A2 DvY l yJc p xeHe->Pm(l ~ °(1))) < 2D Q ^ C ^.. 



where C l/1 =32D 2 C 2 A2 C 2 w2 C; x 1 . 



For the P2 term, note that £ = B( m )(h( m ) — h( m )) is a (2Z?o)-dimensional normal vector with 
zero mean and the component variances 



Var(&) = e 



Wm.lj 



by Assumption (A4) and inequality (|5.10p . Then, using (|5.8p . similarly to the case of Pi, one obtains 

P 2 < p(||B( m )(h( m )-h( m ))||> 0.5^^^(1-0(1))' 
< 2D max P (|&| > v ^C A2 D )- 1 p 2 m rj me (l - o(l))) 



< 2D max P (|£i|A/V^6 > v ^C A2 D Q )' 1 V(C^i)" 1 m^" 1 ) ) < 2D^I C ^ , (10.12) 



where C„ 2 = 32Z)QC^ 2 Cii , Ci. Combination of (jlO.lip and (I10.12p completes the proof of the lemma. 
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10.3 Proofs of statements in Section [5] 

Proof of Lemma Q] is based on the following statements. 

Lemma 7 Let 1 < p,q < oo, s > max (1/2, 1/p), and Assumptions (Al)~(A4) hold. Let Bji and 
Rji £ be defined in {4-l<ty an d M-l>fy > respectively. Then, 

J-i 

sup Y, E ^HBjuRjie) < CA(e) M*' 1 )]", ( 10 - 13 ) 

where A(e) and p are defined in $3. 3\) and A5.5\) . respectively. 
Proof of Lemma First, note that 

C a [l j£ ] {l - a)+ , if a 7U, 



E R i le ~ R h - 



1&U 4 



C a ln(l je ), if a 7^ 1. 



(10.14) 



Here, x + = max(x,0), lj e is defined in formula (j4.8|) and constant C a depends on a only. Recall 
also that if / £ B^ q (A), then 



2^-1 



J2 h )k < C*2" 2 ^*, \b jk \ < A2~i s '. 



(10.15) 



k=0 

Note that it follows from (110. 14j) and (I10.15|) that 

Di^i) = R & - 2C Q 2^ +max ( 1 ' a » e pn(e- 1 )] I ( a = 1 )-(°'- 1 )+ J 

j=m leUj 

J-l 

D2U2) = Y.Y. B i^ c * 2 ~ 2j2S '- 

3=32 leUj 

We consider the strongly inhomogeneous and the weakly inhomogeneous cases separately. 

The strongly inhomogeneous case. Let 2s(a — 1) > (/3 + 1)(1 — 2/p). Choose ji so that 



Di(ji) < Ce^+^+1, i = l,2. 
It is easy to see that (jl0.16p holds if j% and ji are such that 

a + .fl I(n=l) s' 

2 Jl = £ (max(l,a) + 0)(2s' + a + /3) [hl(e~ )] 2 3 ' + Q + /3 ; 2 j2 =£ s* (2s' + > 



Now, we need to evaluate 



32-1 



j=h leUj 

Consider cases of a > 1, a = 1 and a < 1 separately. 

If a > 1, then p > 2 since, otherwise, s* = s' and D^(ji, j%) = 0. Observe that 



(10.16) 
(10.17) 
(10.18) 



32-1 

D 3 (ji,h)< E 

j=ji + l 



32-1 

5 E 

3=31+1 



E E % + E ^ 

\l\<Nj keUji \l\>Nj 
E [^-^(e-^l^/P + C e 2- J '( Q+/3 )[iV j ln(e- 1 )] 1 ^ Q 



Ak-kojl^Njlnie- 1 ) 



[10.19) 
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The two terms are of equal order if 

N, = C [He- 1 )}- 1 
By direct calculations, one can check that 



. 2 i(2s'+a+/3) 



p/{pa-2) 



32-1 



where 



D 3 (jx,h) < C 2i7e_pQ - 2 

3=31+1 

_ 2s' + (3(1 - 2/p) - 2as 
7 ~ a - 2/p ' 



(10.20) 



(10.21) 



(10.22) 



p-2 2s 

Note that 7 < in (|10.22p since a > 1 > 2/p. Therefore, D 3 (j 1 ,j 2 ) < C2 Jl7l £p<*- 2 = Ce^w 
and p = in (|1U.13|) . 

If a = 1, then p < 2, so that s* = s' and 2 J1 and 2 n differ by logarithmic factor only. Then, 
D 3 (ji,j 2 ) < C2~ 2 ^ s ' and (flimj) holds with p of the form ([B3L 
If a < 1, then p < 2. Note that, in this case, 



p/2 



f/ 2 = I E «& 



and, therefore, 



j'2-1 

D 3 (ji,j 2 )< £ 
j'=ii+i 

j'2-1 



E E%» + E B f < p/2 



|Z|<iVj fcGf/ji 



U|>JV,' 



(10.23) 



<C ^ efiVj ln^" 1 )] 1 -" 2 J '( Q+/? ) + (e [m^ 1 )] 1 -^ 2 J '( a+/3 ) ) 



l-p/2 



The terms in the sum above are of equal order if 



Nj = C 



[ln^- 1 )] 1 "" 2^ 2s ' +a+ ^ 



p/(pa-2) 



Then, 



j'2-1 



D 3 (jx,j 2 )<C Y 2^ [He- 1 )] 
3=31+1 



(l-oQ(2/p-l) 
2/p-a 



(10.24) 



(10.25) 



where 7 is given by equation (|10.22p and is positive. Hence, p = (2/p— a) 1 (1— a) (2/p— 1) in f| 10. 13[) . 
The weakly inhomogeneous case. Let 2s(a — 1) < (/3 + 1)(1 — 2/p). Choose % so that 

D .(j.) < Ce^rf+r, i = 1,2. (10.26) 
It is easy to see that (110. 26[) holds if j\ and j'2 are such that 

l+B , I(a=l) s 

2 n = e (™(i,«)+ffl(2«+i3+i) [We -1 ) 2 S+/ 3+i 5 2 J2 = e s *(2 S +i+/3>. (10.27) 
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Again, consider cases of a > 1, a = 1 and a < 1 separately. 

If a > 1, then p > 2 and s* = s. Then, we obtain that D 3 (j\,j 2 ) is of the form (|l(J.19p and 
the terms are equal if Nj is given by (110, 20j) . Plugging in the value of Nj, we derive (|10.2ip with 
7 > in (|10.22p . After that, it is easy to check that p = in ([53]) and (|10.13|) . 

If a = 1, then p > 2, so that s* = s' and 2 n and 2 J2 again differ by log factor only. Then, 
D3U1J2) < C2~ 2 ^ s ' and p3} holds with p of the form (jf)3j) . 

Let a < 1. If p > 2, then ji = j 2 and D 3 (ji,j 2 ) = 0. If p < 2, then D 3 (j i,j 2 ) i s 
given by (|10.23p . and the terms in the sum are equal to each other if Nj is of the form (j!0.24j) . 
Then, D 3 (j 1 ,j 2 ) is of the form (110351) with 7 < 0. Plugging into (gfagD yields (fT(H3l) with 
p = (2/p-a)- 1 (l-a)(2/p-l). 



The moderately inhomogeneous case. If 2s(a — 1) = (/? + 1)(1 — 2/p), then 7 = in 
<nHT2\i . The value of D 3 (jx,j 2 ) is given by (fTU^TD or (fTIXm and, since j 2 - ji < Cln^" 1 ) and 



P 



2s' 



2s 



ap-2 2s' + a + p 2s + /3 + 1 ' 
equation (110. 13H holds with p of the form (I5.5P . This completes the proof. 

Proof of Lemma [J3 Note that 

A = E\\L m - / c || 2 = Ax + A 2 + A 3 + A 4 , 



where 



A 1 



Var(a mfc ), A 2 = 6 2 fc , 



A, 



A, 



keUji 



J-i 

^ EE 

j=m leUj 

J-l 

- £ Yl B l P ^ ^ ^ 

j=m leUj 

with r defined in ()5.3|) . 

By Assumption (A4) and formula (I5.ip . one has 



fell < ^ cfx^ . 



k&KZ 



(10.28) 



[10.29) 



By (|10.15p . one has A 2 < C2~ 2Js * where J is defined in (|4TT5j) . Note that if 2s(a - 1) > (p + 1)(1 - 
2/p), then j 2 in (jlO. 17|) is such that j 2 < J since 2s' > 1 and s*/s' < 1/2 due to s > max(l/p, 1/2). 
If 2s(a - 1) < (P + 1)(1 - 2/p), then j 2 is given by formula (110.271) . If 1 < p < 2, then a < 1, 
s* = s' and j 2 < J since 2s' + s'(l + /3)/s > 1 + (a + /3)/2. If p > 2, then s* = s. Moreover, due to 
2sa < 2s' + 0(1 - 2/p), one has a < s'/s + 0(1/2 - l/p)/s < (s + 1/2) /s + P/(2s) < 2 + p since 
s > 1/2. Then, 1 + (a + p)/2 < 1 + 2s + 1 + p and j 2 < J. Hence, 



A 2 < C A(e) 

where A(e) is defined in formula (|3.3p . 

In oder to obtain an upper bound for A3 and A4, note that 



;i0.30) 



A 3 < A31 + A 



^32, 



A 4 < A41 + A 



M2, 



:io.3ii 



31 



where 



= EE 

j=m Zel/j 

a 32 = EE 

J-l 

EE 



E(6 jfc - 6 jfe ) 2 I J] (S ifc - b jk f > 0.25 r 2 ^ k 



feet/,-. 



i feet/,-, 



E(b jk - b jk ) 2 l{Bji > 0.25 r 2 R jle ) 



feet/,-. 



A 



2^ B fl P E - 6 J fc ) 2 > °' 25r2 ^ 

j=ml£Uj \ksUj, 
J-l 

A 42 = Yl E ^ l-25r 2 ^ i/£ ) 

j=rrt ZgZTj 

By Lemma [J] with tq = r/2, we obtain 



(10.32) 



Y (bjk - b jk ? > 0.25 r 2 R jl£ ] <e> 
■ feet/,-; 



(10.33) 



Hence, since j ' < J — 1, derive 



A3, < EE 

i=mZef7j 



1/2 



1/2' 



E(6 ifc - 6 jfc ) 4 



Y (hk -bjkf > 0-25 r 2 R jl£ 



K keUji 
J-i 



.feet/,-, 



< Ce 1+a5x ^ AT 2 < c e 1+ --/3+i+min(i, Q) = o(e) 



(10.34) 



j=m k£K? 



due to assumption ()5.3|) . and since E(6jfe — 6jfc) 4 < 3e 2 C n A J - 4 . In a similar manner, since bj k = o(l), 
as e — >• 0, one has 



A 4 i < Y E 6 jV 



(10.35) 



In order to find an upper bound for A32 + A42, note that ^feefJ-; ^(bjk ~ bjk) 2 < C u C\Rji £ , so that 

J-l 

A32 + A 42 < [^uCA^-fe KB 3 i > 0.25 r 2 + Bji I(B jt < 1.25 r 2 

,j=m ZeC/j 

J-l 

^ C E E min(%,^-te) < C A(e) [ln^ 1 )]" (10.36) 



by Lemma Combination of formulae (|10.28p - (|10.36p completes the proof of (|5.4p . 
In order to prove (j5.6fl . note that 



A* = E||/ C , m - / c || 4 < A* + A^ + A 



3- 



32 



where 



A3 



O E|| ^ (a mfe - a mfc )v9 mfc || 4 , A^ = O I || ^ ^ b jk ip 



k&KZ 



J-l 



O E 



X X X] - 6 i fc ) 2 ^ ^ 0.25r 2 i^ e ) 

j=m /eC/,- k£Uji 



Observe that, by Assumption (A4), since m < J, 



A* = O 2 m ^ E(a mfc - a mfc ) 4 =0 2 m e 2 A^ fc 



= O (e 2 2 J ( 2a+2 / 3+1 +( 1 - 2Q )+) [ln^- 1 )] 1 ^ 1 ) j = O | 
For A 2 , by (|1U.15|) . we have 



2a+2^-2+(l-2a) + 
2+a+/3 



o( £ - 2 ). 



A? 



EE": 



ij 



2 



-4ms' 



o(l). 



Finally, similarly to ()10.32p . partition A3 as A3 = Ag X + Ag 2 with A^ and Ag 2 corresponding to 
I (J2keu jl (hk- b jk) 2 > 0.25 r 2 R jl£ ) and I(J2keu jt h )k > 0-25 r 2 R jle ), respectively. For A^, applying 
dUSD , and pU.33p . obtain, as e -)• 0, 



A 
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0[2^ gj> 



|S ifc - 6 ifc | 4 I( ^ (S ifc - b jk ) 2 > 0.25 t 2 % £ ) 



feel/,-; 



h J X X [nhk-bjk\ l 



j=m keKfj 
J-l 



1/2 



P (X (kk-bjk) 2 > 0.25 r 2 % E ) 



l/2> 



1 e2 E E 



4_4^ 



x 2 2a+2/3-2+(l-2a) + ' 

O I e 2 2+ Q +/i 1 r~-2\ 



o(e-% 



similarly to A\. Finally, Ag 2 = o(e 2 ) by considerations similar to those in the case of A32 in (jl0.36D . 



Proof of Lemma [2] It is easy to see that A = E||/o m — /o, m || 2 = Ai + A 2 where 
Ai = ^2 ^2 b 2 jk , A 2 = ^2 E(a mk - a mk ) 2 , 

j=m keK 0m k£K 0m 



(10.37) 



and a mk = zl^ for k £ Aom- From characterization (|3.1[) of Besov spaces, it follows that, for any 
k, one has b 2 k < A2~ 2jfs ', and, therefore, since the number of indices in the set Ko m is finite, 

= O ( jr 2~ 2js ' \ = O (V w ) . (10.38) 



A-, 
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In order to find an upper bound for A2, note that 

A 2 < 3(A 2 i + A 22 + A 23 ) (10.39) 

where 

A 2 i = E||(A (m) ) _1 (c ( ' m) - c (m) )|| 2 , 

A 22 = E||(A (m) ) _1 B (m) (h (m) - h (m) )|| 2 , (10.40) 

a 23 = ikaMj-V^h 2 . 

Consider matrix D^" 1 ) = \J diag(A( m )) with elements ||iUm,z||, ' ^ i^om, and matrix G^ m ^ = 
(DH)- 1 aH(dH) -1 . Note that is a positive definite matrix of a finite (non-asymptotic) di- 
mension with the unit main diagonal, hence, for some positive constants Cqi and Cq2 independent 
of m, one has ||G^ m ^|| < Cqi and ||(G^) — 1 || < Cq2 which, in combination with (15.7|) . immediately 
imply (|5.8p . By (|5.8p . one has 

A 21 < ||(A (m) ) _1 || E||c (m) - c (m) || 2 < C e/>~ 4 £ |h m , fc || 2 < Cep~ 2 . (10.41) 

Now, let us examine A 22 term. It is easy to see that 

A 22 < C||(D (m) ) _1 || ||(G (m) ) _1 || E|| (D (m) ) _1 B (m) (h (m) - h (m) )|| 2 . 
It follows from Assumption (A4), formula (|5.7p and condition (|5.10p that 

A 22 < Cep^ Y, E II W m> l\\ 2 ( w m,h Wm,ki) (W m ,h ™m,k 2 ) 
leKom ki,k 2 £K^ m 

2 



< Cep~ 4 

ieK „ 



Y ( W m,l,W mtk )t rn.k 



k£KZ 



<CtPm Y E X m 2 k( W ™,l,Wm,k) 2 <Ce\J. 



Hence, due to condition (I5.9p . 



l£K 0m keKZ 



A 22 < Cep~ 2 . 

Finally, since, due to (|10.15j) . \bjk\ < A2~ JS , observe that, by (15. 8p . one has 

\2 



A 23 < C 2 A2 Pm Y ( W ™,l,QRn 



(10.42) 
(10.43) 



< C 2 A2 A 2 2- 2 ™' p" 4 



E 

1&K Q „ 



00 2?-l 

E E 1(^1, vj,k) 

j=m k=0 



< 2DC 2 A2 A 2 K 2 2~ 2ms ' 



Combination of (|10.38|) - (|10.43|) completes the proof of (I5TT2]) . 



Now, we need to show that (|5.13p holds. For this purpose, note that E||/o,m — /o,m|| 4 < A 2 +A* 
where Ai is defined in (fTtWp and A* = AJ + A?j + A^ with A| = E||(A^ m )) _1 (c( m ) - c( m )|| 4 , 
A^ = E||(A( ,n ))- 1 B( m )(h( m ) -h( m ))|| 4 and A* 3 = ||(A( m ))-i r M ||4 = A 2 g Recall that A * = 
as e — > 0. It is also easy to check that, similarly to terms A 2 i and A 22 , as e — > 0, one has 
AJ < Ce 2 p^ < Ce 2 X m 4 and 

2 



A* < Ce 2 2 2m 



E X ™,k( 



keKF, 



< Ce 2 2 2m \- A . 



Now, validity of (|5. 13j) follows from the fact that, by (|4.15p . 

e 2 2 2m A" 4 < Ce 2 2 J{ - 2+ ^ pn(e _1 )] I(a=1) < Ce~ 2 . 
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10.4 Proofs of statements in Section [6] 

Proof of Lemma [3j Note that by definition of rh, whenever m > niQ, there exists j > uiq such 
that ||(/ mo - falinmoW > ^elnie-^Xj 2 . Therefore, 



J-i 

n>m ) < ]T V i with V i = F (lK/™o " fM^m )\\ 2 > K 2 sln(e- 1 )\J 2 ') . (10.44) 

j=m 



Observe that since 

||(/ mo - fjX^W < Kfoj ~ f°Mnm )\\ + Kfcj - fcM^W 

+ ||(/o,mo - /o,mo)I(^m )ll + \\(fc,m ~ fc,m )K^m )\\, 

one has the following upper bound for Vj defined in (|10.44[) : 

Vj < Pa.,,,,,, I P 0>j ,j I V,:;,.,,,, I Vr.j.j (10.45) 

where, for any mo < m < j, 

Po,j,m = P (||(/o,m " /o,m)I(O mo )|| > 0.25KTfe e ) , 
Pc,j,m = P (||(/cm " /c,m)I(^m )ll > 0.25/6^) . 

Since supp(/o, m ) f= £ ^m f° r 771 > m o 5 one has 

||(/0,m " /o,m)I(^ mo )H 2 = ll(/0,m " /o.mM^rn) f = ||/o,m " /o,m|| 2 (10.46) 

< \\i( m ) - z ( m )\\ 2 + 2DA 2 2~ 2ms ' . 
Hence, applying (|10.46[) and Lemma [6] with v = k/8, one derives 

Vo,j,m < IP (||z M - z M || > 0.25^ - AV2D2-' JS '^ (10.47) 
< P(||z( m ) — z< m > || > K-n je /S) < Ce^, (10.48) 



since, \[2~D A2 i s < k rjj e /8 for tuq < m < j if e is small enough. 
Now, let us consider the second term, P c ,j,m- Denote 

L D = min(L^ - D , L$ - D), Ud = max(U 9 + D , + D), 

C D = max(\L ip -D Q \,\L i ,-D\,\U lp + D Q \,\U i p+D\) (10.49) 

and observe that supp(</? m fc) and Q mo have non-empty intersection only if k & K mimo , where 

K m , mo = {k: \k- k 0m \ > D , 2 m ~ m °L D -U v <k- k 0rn < 2 m ' m °U D - L^} . 

Similarly, supp^/c) and Q mo have non-empty intersection only if k E Kj j7no , where 

K^ mo = {k: \k-k 0j \>D, 2i- mo L D -U^<k- k 0j < 2i- mo U D - L^} . 

Therefore, I € Uj where 

U* = {l: le Uj, \l\ < l* j£ } with l* e = C D [^(e- 1 )]^ 1 2 j ~ m °, (10.50) 
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and, hence, for m > mo, one has 



||(/ c , m -/ c , m )I[(^o)l| 2 < E E &? fc + ||i>)-hM|| 2 (10.51) 



3 m k(z.Kj frri Q 



j-i 

+ E E E - bjfc) 2 I(% > r 2 i? j7e ). 

i=m «SC/* fcelT,-! 

It follows from (|1U.15|) that, if s is small enough, one has 

oo 

E E b % ^ A 2 (2C D )^ 2 M+ 2~ 2m ° s ' < 2~ 5 K 2 r, 2 m£ . (10.52) 

j=m uaK . 

Also, observe that 

l{B fl > r 2 R jl£ ) < I E " 6 ifc) 2 > 0.25r 2 R jl£ + 1 E b h > °- 25 ^ 2 ^ • ( 10 - 53 ) 

If e is small enough, then, due to (|10.52p . one has b 2 k < 0.25r 2 Rji £ , and, hence, the second indicator 
in (|10.53|) is the identical zero. Hence, it follows from formulae (|10.51|) - (|10.53|) that 

*v*m < p (iih M - h^ii > + E E p ( E iv* - M 2 > ^) • ( 10 - 54 ) 

Since components — hfj^ of vector h^" 1 ) — h^" 1 ) are normally distributed with zero means and 
variances e||t m fc|| 2 , using Assumption (A4), one obtains for the first term in (|10.54|) : 



Recalling that the sum in ()10.55j) has at most 2 J terms and applying (|1Q.33|) . derive 

K 2 1 2 2 

^c,i,m < Ce 64 ^ +Ce x ~^+P. (10.55) 

Combination of formulae (|10.44j) . (|10.45j) . (|10.47j) and (j!0.55j) and definition (|10.10p completes the 
proof. 

Proof of Theorem [3]. Observe that 

mp 

A = E[||/ m -/|| 2 = ^ E[||/ m -/|| 2 I(m = m)]+E[||/ m -/|| 2 I(m > m )} = Ax + A 2 

m=mi 

and consider terms Ai and A 2 separately. Note that for any m E [mi, mo] one has 

ML - /|| 2 < 2E||/ mo - /|| 2 + 2E||(/ m - f mo )I(x G O m )|| 2 + 2E||(/ m - f mo )I(x G ^)|| 2 
where set Q, m is defined in (|6.ip . By Theorem [2j obtain 

E||/ mo - /|| 2 < CA(e) [ln^- 1 )]" 
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where A(e) and p are defined in (|3.3[) and ()5.5[) . respectively. If rh = m < mo, then by definition of 
m, since 2s' /{2s' + a + /?) > 2s/(2s + /3 + 1) for a > 1, derive that 

E\\(f m -f mo )I(x G n m )\\ 2 < ^eHe-^X-l < C £ 5^[i n ( £ -i)]i+%=i) < C A(e) pn(e _1 )] 1+I(a=1) - 

Now, recall that £l m is defined in such a way that supp(/o jm ) G fi m for any m and C Q,j2 for 
ii > Hi so that, for m < mo, one has 

E||(/ m -/)I(xEa^)|| 2 = E||(/o, m + / c , m -/o, m -/ c , m )]I(xG^)|| 2 

= E||(/ C>m - / c , m )I(x G < E||/ C , m - / c , m || 2 < CA^Iln^- 1 )]" 

as e — > 0. Observing that 

n\(L - fmoKx G ^)|| 2 < 2 [E||(/ m - f)I(x G ^)|| 2 + E||(/ mo - /)!(* G ^)|| 2 

combining all formulae above and noting that p in (|5.5p is such that p < 1 + I(a = 1), obtain that 
Ai < C A(e) [ln(e -1 )] 1+I(a=1) as e -> 0. 

By Lemmas [TJ and El as e -> 0, one has E||/ , m - /o,m|| 4 = o (e~ 2 ) and E||/ Cjm - / c , m || 4 = 
o (e~ 2 )- Then, since, due to (|6.7p . one has d > 2 in (|6.4p and (|6.5p . Lemma [3] yields 

A 2 < ^E[||/ m -/||4 ^P(m = m > m ) = O (e^/ 2 " 1 ) = 0(e), (e ->• 0) 
which completes the proof of Theorem [3l 

10.5 Proofs of statements in Section [7] 

Proof of Proposition [TJ 

Note that Vj jk (x) = p(x)Vj^(x) where Vj jk (x) = L q(x — t)ipj k (t)dt. Denote by *f?i the Z-th 
anti-derivative of tp, i.e. 

y l (t) = [(i-i)r 1 I (t-z) i - i ^{ z )dz, vf\t)=m- (io.56) 

Observe that for < I < r one has ^i(L^) = ^>i{U^) = and also supp^z) = (L^,U^), hence, 
integrating Vj )k by parts (r — 1) times, we derive 

V jk (x) = (-l) r - 1 2-J't r - 1 /2) f * q^- 1 )(2-\2^x -k-z)) d¥ r (t). 



According to assumptions on q, the derivative g( r_1 )(3;) has one or several jump discontinuities at 
points xi, ■ ■ • ,Xl- Without loss of generality, we consider the case when L = 1 (there is only one 
jump discontinuity) and X\ = (due to periodicity, one can always achieve this by an appropriate 
shift). Hence, by integrating Vj^ by parts one more time, we obtain 



V j>k (x) = (-1Y2- 



f q( r \x - t)2^H r (2H - k)dt - 2^ 2 ^ r {2^x - j^A^^O) 
Jo 



(10.57) 



where Ag( r " 1 )(0) = g( r ~ 1) (0+) - g^" 1 )^-) is the size of the jump of q {r ~ l \z) at z = 0. 
Recall that x = is the only jump discontinuity of q^'^ix) and that 

@ jk = supp [Vrflx ~ k)} = (2~ j (L 4 , + k), 2-i(U^ + k)) . 
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Therefore, 

Vj, k = V jtk>1 + V jA2 + V jtk>3 , (10.58) 

where 



V jAl (x) = (-l) r 2^ r I(x G @ c jk ) / q^(x-t)2^ 2 ^ r (2H-k)dt, 

Jo 

V jA2 (x) = (-l) r 2- jr I(xe@ jk ) [ q^(x-t)2^ 2 ^ r (2h-k)dt, 

Jo 

V jA3 (x) = (-iy +1 2-i r I(x G S jk ) A<^~1)( ) a~\x) 7?/** T {2?x - k). 
Then, \\vj, k \\ 2 = M x + M 2 + M 3 + 2M 23 where 

Mi= [V^ k .i(x)fn 2 (x)dx, i = 1,2,3, M 23 = / V^^x)^^)^)^. 
Jo Jo 

Since, for x G 0^ fc , ^^^x — fc) = and is n times continuously differentiable, one can use 
formula (j 10.57j) for Vj ;k with t\ instead of r, so that 

V jtk ,i(x) = (-l) ri 2- jri I(x e O c jk ) f q {ri) (x - t)2 j l 2 ^ ri (2H - k)dt. (10.59) 

Jo 

Therefore, we derive 

M X < 2-^+ 1 )||^)|| 2 0O ll^nlll, 

M 2 x ^ 2 (2^k) I V 2 K1 {x)dx < C2-^ 2 ^ +1 ) [\k - k oj \ a + 1] Wq^Wl ||* r ||£ 1( 
M 3 x V 2 (2~ 3 k) [ V hK2 {x) 2 dx^2^ 2r+a ^ [ \z + k-k 0j \ a y 2 r (z)dz~\ k ~ k ° j \ a + 1 



.j, K ,^~, ~~~~ , ,~ uj, ^ rV ~,-~ ~ 2i(2r+a) ' 



and |M 23 | < y/M 2 M 3 < c 2- j{ - 2r+a+1 / 2 \ Hence, due to condition 2r x + 1 > 2r + a, the value of X j>k 
is given by expression (I7.10|) . 

Validity of Assumption (A2) then follows from formula (|7.9p and the fact that functions ua k 
have bounded supports. Assumption (A4) can be verified in a similar manner. In order to show 
that Assumption (A3) holds, use decomposition (I10.58P as above and note that it is sufficient to 
verify Assumption (A3) for each of the three functions Vj,jfc,i, Vj jkj2 and V^fc, 3 . For Vj :kj2 and Vj : k,3, 
Assumption (A3) is satisfied since both functions have bounded supports. In the case of Vj,fc,i, due 
to (HEED, one has \~l\\V jjk)1 \\ 2 < C2-^ 2r ^ +1 - 2r -^[\k - k 0j \ a + l]" 1 , so that 

a SII^.mII 2 ^ c2-^ 2ri+1 - 2r ~ a h^ 1 -^ < C2~ 2 ^~ r \ 

k=0 

Therefore, Assumption (A3) holds by Cauchy inequality. Assumption (A5) is valid due to condition 
(E2). 

For completion of the proof, it remains to check conditions (I5.9p - (l5.1ip of Lemma [21 To start 



38 



with, we need to evaluate p m in ()5.9|) . For this purpose, note that under condition fjT-4j) . one has 



[\ 2 (x)dx \ / 1 2 m /V(2 r 
Jo Uo 



'z — k)q(x — z)dz 



2~ m / \y\ a 



<p(z)q(y -2 m (k - k 0m - z))dz 



dy 



2~ m C \y\ a <p(z) {g(y) - 2~ m (k - k 0m - z)q'(y - 2~ m [k - k 0m - £(*)])} dz 

Jo JLu 



dy 







2 


[ \y\ a q(y)dy 


/ ip{z)dz 




10 


J Lin 





Here R m < C[2~ m \k - k 0m \ ||g||oo Halloo + 2~ 2m \k - k 0m \ 2 \W\\l>], so that, R m is always bounded 
and, R m < C2~ m if k G Ko m . Therefore, pl^ = 2~ m . It also follows from the above that quantities 
p^(w m j,w m ^} 2 are uniformly bounded above and, thus, condition (|5.1U|) holds due to definition 
(jO) and condition (pjH . 

Now, it remains to verify condition (|5.1ip . Note that Vj ik in (|10.57|) can be expressed as 
V jjk (x) = 2?/ 2 Hj(2?x - k) where 



(_l)r- 2 -J> g W(2-i( x - t)) %(t)dt - ^ r (x)Aq^{0) 



, if x e [L^p, Uj,], 
if x e [Z^, C/^,] c , 



(_l)rx 2 -in g (n)( 2 -J( a; _ t) ) xfr^i, 
and [L^, = [0, 1] \ [L^, U^[. Then, 

\(w m ,l,Vj, k )\ <C2^ a+1 ^2- m / 2 \\q\\ oo y\\ Ll J \y + k-k 0j \ a \H 3 (y)\dy 
where W jk = J \y + k - k 0j \ a \Hj(y)\dy = W jkl + W jk2 + W} fc3 - Here, 

W jkl = [ \y + k-k i)j \ a \H ] {y)\dy<C2^-^ C \q™(z)\dz [ * |* n (t)|dt < C2^ a ~ ri \ 

J\u h y,,\c Jo Jl,,, 



W jk2 = 2~l r / \y + k-k 0j \ a 



(-'„■ 



{2-\x-t))^ r {t)dt 



dy < C2~^ r ~ a \ 



W jk3 = 2^ p |AgW(0)| f U% ' \y + k-k 0j \ a \^ r {y)\dy<C2- 
JL,|, 



■j(r-a) 



Thus, 1(^,^)1 < C2-^2-(j+ m )/ 2 and 

oo 2J-1 



j=m fc=0 



so that, due to r > 1, condition ()5.11|) is satisfied, which completes the proof. 
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